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AN EXPERIMENTAL COMPARISON OF THE 
STUDY-TEST AND TEST-STUDY METHODS 
IN SPELLING 


ARTHUR I. GATES 
Teachers College, Columbia University 


The present study is one of a series of investigations planned to 
inquire into the possibilities of improving instruction in spelling. 
Investigations accepted as dissertations for the Doctorate at Teachers 
College by Robert Thompson, Herbert A. Carroll, James E. Menden- 
hall, Claire T. Zyve, Ina C. Sartorius, and W. H. Coleman,' deal with 


closely related problems, and others will be completed shortly. 


The present investigation is an inquiry concerning the gross 
efficiency of the two most widely used general methods: the Pre-study 
or Study-test method and the Pre-test or Test-study plan. In some 
measure, the investigaton provides data which suggest the inherent 
limitations and merits of the two rival procedures and which indicate 
possible improvements in each. The study also provides several clues 
for further research for the purpose of appraising modifications of 
possible value. 

The literature affords few experimental comparisons of the Pre- 
study and the Pre-test methods. Keener’s study? based upon nine- 





1 Thompson, R.: ‘The Effectiveness of Modern Spelling Instruction,” Teach- 
ers College, Columbia University, Contributions to Education, No. 436; Carroll, 
H. A.: ‘‘Generalization of Bright and Dull Children: A Comparative Study with 
Special Reference to Spelling,” Teachers College, Columbia University, Contribu- 
tions to Education, No. 439; Mendenhall, J. E.: “An Analysis of Spelling Errors,’ 
Teachers College Bureau of Publications, 1930; Zyve, C. T.: ‘‘A Comparative 
Study of Methods” (in press); Sartorius, I. C.: ‘‘The Bases of Generalization in 
Spelling” (in preparation); Coleman, W. H.: ‘‘Studies of Spelling Vocabularies’ 
(in preparation). 

2 Individual Method vs. Group Method of Teaching Spelling. Fourth Year 
Book of the Department of Superintendence, Washington, D. C., 1926. 
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hundred seventy-six pupils in Grades II to VIII of the Chicago 
schools seems to show that there is a slight superiority of the Pre-study 
method in Grades II and III, and superiority of the Pre-test plan in 
Grades IV to VIII. A study by Woody,' confined to pupils in Grades 
VI, VII, and VIII gave no consistent advantage to either plan. Kil- 
zer? in a study confined to ninth-grade pupils found, according to his 
methods of computing results, that while the test-study method gave 
better immediate results, the advantage was not apparent in tests 
after an interval of six months. 


Tue Stupy-TeEst PROGRAM 


For the Study-test procedure the same words were used as for 
the Test-study method. The number per week varied with the grade. 
Each week’s assignment was divided into four equal parts. One of 
these short lists was studied on Monday, Tuesday, Wednesday, and 
Thursday, respectively. Friday was devoted to a test on the week’s 
assignment, followed by review when time permitted. 


The method of introducing the words was substantially as follows: 


1. The teacher pronounced the word clearly. If the word contained two or 
more syllables, it was pronounced by syllables. 
2. The word was used in one or more sentences. In some cases pupils were 
asked to use the word in a sentence. 
3. The teacher wrote the word on the blackboard and had the children say it. 
4. The pupils looked at the word and said it, syllable by syllable, to them- 
selves. 
5. The pupils looked at the word and said the letters to themselves. The 
pupils were encouraged to group the letters by syllables as they said the letters. 
6. They looked at the word as a whole as they said it to themselves. 
7. Same as (5). 
8. Pupils closed their eyes and said the letters to themselves. In some 
instances the teacher asked one pupil to do this aloud while the others listened. 
9. Pupils wrote the word as they said the letters to themselves. They pro- 
ceeded by syllables whenever possible. 
10. They compared their written word with the correct form on the board. 
11. Pupils covered their word, wrote it again and compared it with the correct 
form. 


12. They repeated (11) until they could write the word without error. 





The Evaluation of Two Methods of Teaching Spelling. Fifteenth Yearbook of 
the National Society of College Teachers of Education, University of Chicago Press, 
1927. 


? Kilzer, L. R.: The Test-study vs. the Study-test Method in Teaching Spelling. 
School Review, 1926, pp. 521-525. 
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Comparison of Methods in Spelling 3 


The above plan was modified in details to meet the particular 
needs of the different groups, which varied from the dullest second- 
grade to the brightest eighth-grade classes. 

Near the end of the period the pupils were given a test of the words 
studied during the day, If a word was missed by more than half the 
class, it was added to the list for the next day. 

On Friday a test of all words taught during the week was con- 
ducted. Words frequently missed were put into a review list. Those 
proving to be particularly difficult were retaught, if possible, during 
the following week. Others were retained for a review period about a 
month later. 


Tue Trest-stupy PLAN 


The Test-study plan employed was in general outline the plan 
recommended by Horn! in 1919 and since widely used. The schedule 
is as follows: 


Monday. —Test on all words in the new assignment for the week. 

Tuesday. —Individuals study the words missed on the Monday test. Pupils 
making no errors on the Monday test are excused. 

Wednesday.—Test on new and review words. All pupils take this test. 

Thursday.—Study of words missed on Wednesday test. Pupils making no 
errors are excused. 

Friday.—Test on same words as used on Wednesday for all pupils. Study 
missed words as far as time permits. 


Once a month a review test was given comprising the most difficult 
words used during the preceding month. 


THE EXPERIMENTAL SCHEDULE 


The experiment was conducted during the period from February 
1 to June 30, 1928. The period was divided into two parts. Each 
period consisted approximately of nine weeks for teaching and one 
week for testing. During the first period half of the classes used the 
Study-test method and half the Test-study. During the second 
period the methods were reversed. Thus, ‘each pupil spent approxi- 
mately ten weeks with one method and ten with the other. This 
scheme makes two types of comparisons possible: (1) Results obtained 
from groups using the two different methods at the same time and with 





- }! Principles of Method in Teaching Spelling as Derived from Scientific Investi- 
gations. Eighteenth Yearbook of the National Society for the Study of Education, 
Part II. 
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the same words may be compared, and (2) results from the same group 
obtained from one method in the first period may be compared with 
those obtained under the other method in the second period. 


THE SUBJECTS 


The subject’s comprised originally nearly 1800 pupils in fifty-four 
classes in Grades II to VIII of Public School 210, Brooklyn, under 
Principal Frederick C. Graham. The number of records fully complete 
and usable in computing results were as follows: 


ons a odie eA OS Ee ERA ad 4's ee b as CANO 49 classes 1558 pupils 
25 ons 5nd del CORRE MAGEE EO Awe eedee4 49 classes 1678 pupils 
Total number of complete records. ................ccccseeeceees 3236 


For the equivalent group studies—which necessitated the elimina- 
tion of records to secure equivalent scores—the total number of com- 
plete pupil-records was 2900. 


PRELIMINARY AND FINAL TESTS 


The same list of words was used for a preliminary and final test. 
The list consisted of fifty words chosen at random from the words to be 
studied by each class during the ten weeks of the experiment. The 
tests were given by the familiar column dictation method. The num- 
ber of words correct was multiplied by two to convert the scores into 
percentages correct. 

The IQ’s obtained by group-testing in the routine work of the 
school were secured from the school records and used in comparing 
groups. That the initial spelling scores for the classes correspond 
very closely to the intelligence ratings of the classes at a given grade 
level is apparent in the data of Table I. It is also obvious that, prob- 
ably as a result of frequent examinations with the group tests, the 
IQ’s, beginning with Grade III, run high. It is believed, however, 
that the population of the school as a whole is close to average in 
native intellectual capacity. 


RESULTs |. 


Comparison of Classes within a Given Grade.—Since the pupils in 
the school are classed on the basis of intelligence tests, it was impossible 
to secure groups of unselected pupils within a given grade. A schedule 
was therefore adopted which would give as nearly as possible two 
equivalent combinations of classes within each grade. This was 
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TABLE I. —CoMPARISONS OF CLASSES UsING STUDY-TEST AND TEST-STUDY PLANS 
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| 
Number IQ Initial Final Gain 1 Number IQ Initial Final Gain 
Grade II 
32 94 55 97 44.5 42 88 34.3 85.5 41.2 
26 80 12.5 58.5 46.0 28 84 22.1 63.8 41.7 
*58 87.2 34.0 79.9 45.9 *70 86.6 29.7 75.0 45.3 
Low Grade III 
38 109.4 52.8 90.5 37.7 39 121.3 73.4 98.1 24.7 
32 99.8 31.2 73.5 42.3 27 91.3 28.4 66.1 37.7 
*70 105.0 44.4 83.1 38.7 +66 108.5 52.5 85.6 33.1 
High Grade III 
35 126.1 84.5 98.3 13.8 34 106.6 64.4 93.6 29.2 
36 116.6 67.4 92.3 22.9 27 102.0 49.3 79.1 29.8 
22 91.5 42.9 73.0 30.1 
*93 112.2 64.4 90.2 25.8 *6§1 105.2 59.5 87.4 27.9 
Grade IV 
32 118.6 80.3 97.2 | 24 99.3 67.3 91.9 24.6 
24 92.3 53.3 89.2 35.9 24 83.6 44.9 82.4 27.5 
31 107.8 66.0 88.2 33 123.2 80.5 99.1 
28 86.7 47.6 89.7 31 98.5 63.5 93.5 
*115 101.35 61.8 90.8 *108 101.15 64.05 91.2 
Grade V 
38 125.3 79.8 97.8 41 110.8 77.4 94.2 
29 102.1 58.6 93.6 28 92.0 53.5 88.9 
29 101.1 76.3 95.8 27 87.5 58.7 84.5 
*96 110.1 72.1 96.1 *96 100.1 64.2 91.1 
Grade VI 
30 125.7 76.3 96.2 28 148.6 388.3 98.6 
30 104.2 59.6 $8.5 42 115.1 66.4 96.7 
27 108.4 65.8 91.6 31 90.3 55.6 79.7 
37 106.6 64.5 93.1 
*87 112.7 67.2 92.1 *138 115.1 68.7 92.0 
Grade VII 
39 126.1 81.6 94.8 37 121.6 72.1 94.8 
35 109.3 68.1 92.0 23 102.0 64.7 87.5 
36 150.4 86.4 97.7 32 120.0 70.0 94.2 
28 98.4 68.1 91.3 25 108.6 65.7 89.2 
*138 122.0 76.0 93.95 *117 114.5 70.1 92.1 
Grade VIII 
33 137.5 73.8 94.5 22 99.5 61.8 89.5 
31 107.2 66.3 90.9 30 115.2 70.4 94.0 
30 99.2 56.8 89.9 37 126.8 73.0 96.4 
34 117.5 62.0 93.4 28 110.6 57.0 94.9 
*128 115.1 64.8 93.0 *117 114.2 65.5 94.6 



































* Lines marked with asterisk are totals or averages. 
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usually done by combining the brightest and dullest classes in one 
group and the middle classes in the other. The extremes were assigned 
to the Study-test plan and the middle groups to the Test-study plan 
in one grade, the reverse in the next, and so on alternately. Defaults 
in some of the classes upset the scheme in certain classes. 

The records available for the first experiment are shown in Table 
I. In this table are given for each group the number of pupils, the 
mean IQ, the mean initial spelling score in percentage correct, the 
mean final spelling score, and the mean gain. ‘The gain is merely 
the difference between the mean initial and mean final scores. The 
last line under each grade gives the number of pupils, the mean IQ, 
initial and final scores, and gain for all the pupils of that grade. These 
figures were obtained not from the preceding class-scores but from the 
individual records of the pupils in the groups. | 

Since a study by Robert S. Thompson! revealed serious difficulty 
in comparing gains of groups which differ in initial scores, the data in 
Table I are not readily interpreted when the initial scores are unequal. 
Dr. Thompson found that, other things being equal, pupils obtaining 
a low initial score gain more, in terms of percentages of correct spellings, 
than those obtaining higher initial scores. He found that, in general, 
equal teachjng and learning would produce equal advances in the 
interval between the initial percentage correct and one hundred per- 
centage correct. Thus, if one group advances, say, half of the distance 
from initial score forty percentage to one-hundred percentage correct 
or to seventy percentage correct, another group with an initial score 


, 100 — 60 
of sixty should advance to 60 + ‘ee or 80 as the result of equal 


learning. While these relations given by Thompson could be applied 
in this case, it was felt that where differences are as small as these 
appear to be the more laborious process of sifting the pupils down to 
groups equivalent in initial scores would be more reliable. Table I 
is offered for those who wish to make comparisons of classes by utilizing 
Thompson’s data or other means. 

Comparison of Gains Made by Groups of Equivalent Initial Ability.— 
In Table II the results of both studies are shown. In each case pupils 
were first arranged by grades. Then pupils following the Study-test 
plan were matched in initial scores with pupils following the Test-study 
plan. This usually resulted in a surplus of initial scores in one group 





1 Thompson, R.: ‘‘The Effectiveness of Modern Spelling Instruction.”” Teach- 
ers College, Columbia University Contributions to Education, No. 436. 
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and low scores in the other, which had to be eliminated. The plan 
pursued was not an exact one-per-one matching, but one which gave 
the largest number of cases without appreciably disturbing the equiv- 
alence of the groups in mean and SD of initial scores. Thus for 
example, one group might be given three pupils with scores of thirty- 
nine, forty, forty-one respectively as equivalent to two scores of forty, 
when the effects of this assignment could be counterbalanced. Con- 
sequently, although the groups are substantially equivalent, the 
number of pupils in the group are not always the same. 

The results revealed by Table II may be summarized by the state- 
ment that in so far as differences are indicated at all, the Study-test 
method produces larger gains in Grade II and low Grade III and the 
Test-study plan yields greater gains from high Grade III to the Grade 
VIII inclusive. 

The advantages of the Study-test plan in the second and lower third 
grades (i.e., first half of the third grade) are neither large nor, in terms 
of the Standard Error, highly reliable. Since the groups are evenly 
matched in initial spelling scores and since the advantage in IQ’s, if 
any, is enjoyed by the Test-study groups, and since the same teachers 
taught the groups under both plans, the consistency of the superiority 
in gains shown by the Study-test method is indicative of a genuine, 
even if small, difference. The average superiority of the Study-test 
plan in the two grade levels based on four hundred and seventy-seven 
pupil-records is 1.95, or approximately two per cent. The advantage 
is slightly higher (2.11) in the second grade and lower (1.36) in Grade 
III. Whether this small advantage, assuming it to be genuine, is 
sufficient to demonstrate superiority of the Study-test plan in general 
in these grades is a topic for consideration later. 

Beginning with the second half of Grade III and continuing to the 
end of Grade VIII, the Test-study plan shows slightly greater gains. 
Aside from one small setback in Grade IV and another in Grade VII, 
the Pre-test scheme shows to advantage in all of the dozen compari- 
sons. The surplus in favor of this plan is consistently small and 
often lacks ‘“‘satisfactory”’ statistical reliability. The advantages, 
beginning with the lowest grades are: 0.53, 1.45, —0.50, 2.00, 2.80, 1.91, 
2.84, 3.57, 1.76, —0.80, 0.44,1.11. The average of these figures, based 
on 2423 pupil-records, is 1.40. The series shows no consistent devia- 
tion from this average surplus in favor of the Test-study plan. The 
plan, in other words, beginning at the middle of Grade III, seems to 
work as well, relatively, in the lower as in the upper grades. 
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SuMMARY OF TABLE II].—DIFFERENCE IN Favor or Eacn METHOD 











we Number | IQ favor | IQ favor | Gain favor | Gain favor 
of cases S-T T-S S-T T-S 
II 118 a 0.24 3.51 
II 117 0.22 Tee 0.71 
ILIA 120 1.10 PS 1.33 
IIIA 127 ee 1.70 1.40 
IIIB 137 oe 0.80 ey 53 
IIIB 141 1.00 er ry 1.45 
IV | 218 40 is 50 
IV 216 1.80 Soli 2.00 
V 177 has 2.00 2.50 
V 182 .40 nae 1.91 
VI 210 oan 1.00 2.83 
VI 216 1.50 eae 3.57 
VII 226 ete .70 pc 1.76 
VII 240 nape 1.00 .80 
VIII 232 3.30 anh PANE. 44 
VIII 228 acdhate 1.10 fis a 1.10 
Ree seks sacle 2900 9.72 8.54 8.25 18.09 




















COMPARISON OF RESULTS IN BRIGHT AND DuLL CLASSES 


An inspection. of Table I suggests the possibility that the Study- 
test plan shows to greatest advantage in classes of duller pupils,- 
whereas the Test-study method is more suited to the bright. Table 
III is obtained by taking from each grade level the brightest and the. 
dullest classes. Each class was taught by the same teacher first by 
one method and then by the other. Since the initial tests were com- 
posed of different words and given at different times, pupils’ initial 
scores were often different. ‘To secure groups as nearly equivalent 
as possible on the initial test scores the records were sifted as before. 

Table III shows that in Grades II, III, and IV, the dullest pupils 
made greater gains when taught by the Study-test plan. In Grades 
V, VI, VII, and VIII they do as well when taught by the Test-study 
method. The brightest classes on the whole make larger gains when 
working by the Test-study plan. There are two classes, in Grade II 
and Grade VI, in which Test-study shows no advantage. The average 
superiority of the method is 1.61, a figure slightly greater than 
the average gain of 1.40 percentage for the entire population of 
the school. 
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In terms of the gains in ability to spell the words after an interval 
of from one to ten weeks! after the words were studied the results 
show the Study-test method to be slightly better in the second and 
first half of Grade III, whereas the Test-study method is superior 
thereafter. In the case of the dullest of the four or five classes at the 
high third and fourth grade levels the Study-test plan is slightly 
superior, whereas in the case of the brightest of these groups, the Test- 
study plan is equal to Study-test in the second grade and slightly 
superior thereafter. Were no other factors than these results taken 
into account, the recommendation would be to use the pre-test plan for 
bright pupils from the beginning, for average pupils from the middle 
of Grade III, and for the slowest pupils from the beginning of Grade V 
and to use the Pre-study method in the remaining classes. Since the 
advantage of either method is small in terms of specific spelling gains, 
it will be sensible to consider certain other factors. 

Let us consider first certain limitations urged against the Pre-test 
method in comparison with the Pre-study procedure. One frequently 
mentioned limitation of the Test-study method lies in the fact that a 
single dictation test is not a wholly reliable means of determining 
which words a child can spell correctly. In the first place the test 
puts the child into a state of concentration on spelling. A word which 
he might misspell in casual writing or even in a dictation exercise 
in which attention is partly devoted to meaning or composition, or 
both, may be correctly spelled when attention is entirely devoted to 
spelling. Thus words insufficiently mastered are omitted from study. 
In reply to this objection it may be said, however, that the Test-study 
plan provides three tests (on Monday, Wednesday, and Friday) and 
that words known well enough to be spelled correctly three times 
probably need no further systematic drill. 

It is urged similarly that children who are not certain as to which 
of two or three alternatives to use will succeed frequently in getting 
the right form by chance. Thus, Kilzer? found that of words mis- 
spelled in a second test, thirty-one per cent were spelled correctly 
on the first test. Nearly a third of the words misspelled on the first 
test apparently were not perfectly known and, it is urged by some, 
should have been studied. In this connection the plan of giving the 
pupils a brief preview of the words, as by reading them in printed 





1 Since the final test included an equal number of words from each of the 


weekly lists, it occurred at intervals of from one to ten weeks after the words were 
studied. 


2 School Review, 1926, pp. 521-525. 
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columns or in context, has been attacked, inasmuch as it would increase 
the number of words which a pupil could spell immediately without 
thorough mastery. The writer, in a small test of the matter, found 
that twenty-one per cent of the words spelled incorrectly on a second 
test were correct on the first test without preview, whereas the per- 
centage was thirty after the children were given two seconds per word 
for a preview of the list. Kilzer’s finding that eighty-three percentage 
of the words misspelled after an interval of six months were correctly 
spelled in a test immediately after a preview has been interpreted 
similarly. 

These would be strong arguments were the study for the week 
determined entirely by the single Monday test. The additional tests 
on Wednesday and Friday seem to be excellently adapted to take care 
of the conditions here revealed. The plan seems fairly well to adjust 
the amount of study to the needs. With rare exceptions the words 
missed on the first test will be most poorly known and hence pre- 
sumably most in need of the greatest amount of study on Tuesday 
and thereafter in case of a repeated failure on Wednesday. Con- 
trariwise, the mere fact that a child spells a word correctly on Monday, 
even if he fails later, is evidence of a greater degree of learning than 
if he failed entirely. A word spelled correctly on both Monday and 
Wednesday is, with rare exceptions, still better known; and a word 
spelled correctly on all three days is probably known as well as words 
should ever be taught through direct drill in the spelling lesson. If 
such a word is misspelled six months later, the explanation is probably 
to be found in the interfering influence of words subsequently learned 
or to lack of use outside of the spelling lessons during the period. In 
the latter case the remedy is not more drill but better selection and 
placement of words. A word not used until six months after it has 
been taught has been introduced at least six months too soon. Instead 
of studying it more, the pupils should not have studied it at all at that 
time. 

With reference to the management of the easy words—that is, 
words already known well enough to be spelled correctly in one or 
more of the three tests—the Test-study plan seems to the writer to 


have all the better of the argument. It is, in fact, nicely designed 


to save the pupil from needless overlearning of words he can already 
spell with some measure of consistency. The three tests will rarely 
let a word slip through without practice if it really needs drill at the 
time. No method can control all of the exigencies of future events. 
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Concerning the treatment provided for the most poorly known 
words, the advantage of the Test-study plan is usually admitted. 
This plan shows on Monday, with rare exceptions, the words that a 
pupil is least able to spell from the whole week’s list and enables a 
pupil to concentrate on these as much as needed up to the limit of 
the full week. By saving time which the Study-test plan devotes 
to study of the easier words, this plan would seem to enable the pupil 
to direct his study to the words most in need of it. 

At the same time the Test-study plan tends to exaggerate to the 
limit the effects of misplacement of words. The grade placement 
of words is far from a satisfactory condition, and in most schools 
many words are doubtless taught long before they are actually needed 
in writing. Such words will mainly be difficult words to spell because 
they have been used but seldom or never. The Test-study plan 
requires the pupil to devote a maximum amount of time in study of 
these words. Words which will not be used for a long time will now 
either be forgotten before they are needed or overlearned greatly 
to maintain their life until they are called for. In either case the time 
spent in drill is largely wasted. 

The Study-test plan, then, may tend to waste time on the study of 
“easy’’ words already known well enough or nearly well enough for 
successful usage, whereas the Test-study plan tends to waste time 
upon study of “hard” words which ought not to be studied until 
later. Evidence in support of this view may be found in Thompson’s 
study' of the causes of differences in the spelling difficulty of words. 
The present writer believes that his results may best be interpreted 
as indicating that “easy” words, on the whole, are those most fre- 
quently used in writing, and “hard” words those least frequently 
used prior to a given test. 

With the limitation in grade placement now in effect, it is probable 
that one of these deficiencies is about as bad as the other. Theo- 
retically, then, the contest may be called a draw up to this point. 

It should be pointed out, however, that the difficulty with the 
Test-study plan is not as intrinsic and irremedial as the limitation of 
the Study-test method. Remove improper grade placement, and 
this wastefulness of the Test-study plan largely disappears. This will 
not be so fully true of the Study-test plan. Despite the probability 
that better grade placement will tend to produce daily or weekly 
lists of words of more uniferm difficulty, on the average the words 





1 Op. cit. 
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will still be of unequal difficulty to individuals in the class. It is 
precisely in the function of adjustment to individual differences in 
mastery of the various words in an assignment that the superiority 
of the Test-study method lies. The most perfect grading system will 
scarcely make all children equally good spellers or a given child able 
to spell equally well (or poorly) all of the words in a given assignment. 
With a perfect grading system, the Test-study plan would find a more 
ample defense than it does under present mediocre grade placement 
of spelling words. 

Another defect attributed to the Pre-test plan more than to the 
Pre-study plan is the loss of study upon words of all degrees of difficulty 
due to failure of the pupil or teacher, or both, to detect errors in the 
test results. Kilzer pointed out the frequency of such errors when the 
pupils correct one another’s papers. They are likely to appear also 
when the teacher corrects the spelling. It may again be said, however, 
that while errors are likely to go unnoticed occasionally in one test, 
it is not very likely to be missed three times in succession. The 
Test-study plan makes rather adequate provisions for catching such 
mistakes in time to give a word at least one day of practice. The 
plan of having pupils correct their own spelling by comparing their 
test papers with the printed text has several features, moreover, of 
educational value. Apparently what is needed is to encourage the 
pupil to develop a higher standard of accuracy in scoring his spellings. 
Such a habit would be useful through life as an aid in the improvement 
of ability to spell. 

Another objection offered to the Pre-test is that it makes initial 
errors not only a necessary but a natural result. Since the spelling 
of the words is not always known, errors are inevitable. Since the 
making of errors is a necessary result—taken for granted—it seems 
also natural. Thus ill-effects are alleged to be the issue. The first 
is the inculcating of an attitude of tolerance toward mistakes in 
spelling. Pupils are induced to look upon their mistakes with com- 
placency. The second ill-effect lies in the fact that the Pre-test forces 
the pupil to “‘practice errors.” In this connection the alleged tendency 
of initial errors to persist is often emphasized. The Study-test plan, 
on the other hand, by teaching before testing, tends to set up standards 
of accuracy, to prevent initial errors, and to prevent the practicing of 
errors. 

While the writer believes that the importance of these tendencies 
of the strict Pre-test method has often been overestimated, he is 
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inclined to believe that they are genuine and, in some measure, potent. 
To what extent the making of errors during the Pre-test tends to pro- 
duce a tolerance for misspellings, if at all, is difficult to show. It is 
not impossible, indeed, that testing, by revealing errors, tends to 
cultivate an opposite inclination to correct them and to develop a 
sensitivity to errors which ordinary writing does not properly cultivate. 
The general tendency for initial errors to persist is apparently far less 
strong in spelling than in some other functions. Errors in spelling 
tend to be variable rather than constant. If a child cannot spell a 
word he most frequently spells the word phonetically. Most words 
may be spelled phonetically in almost innumerable ways, as Horn! 
and others have shown, and they are spelled in many of these ways in 
repeated attempts by the same pupil. As Woody found, moreover,’ 
the errors made by children in preliminary tests do not tend appreciably 
to persist in their original form. 

Another consideration is the relation of the two methods to the 
interest and effort of the teacher and pupils. Keener’s finding that 
“the majority of teachers . . . were very markedly in favor of the 
(Test-study] method” and that “‘they favored it because of the greater 
interest on the part of the pupils . . . and opportunity of giving help 
where it was needed” is borne out in Public School 210. Although 
the Study-test plan was more of a novelty to the pupils, they favored 
the Test-study plan by nearly ten to one in a sample of votes taken. 

A problem of great importance is that of the amount of time 
consumed by the two methods. Adherents of the Study-test method 
point out that relatively more classroom time is available for study 
when this method is used because less time is spent in testing. As 
commonly used, the Study-test plan requires two tests per word per 
week, whereas the Test-study plan requires three weekly tests. 

Adherents of the Test-study plan reply, however, that their plan 
requires less time in absolute terms—fewer pupil-minutes of work 
per week. The reason for this is that pupils are not required to study 
all words, as in the Pre-study plan, but only those missed on a test. 
Thus, if a pupil misses no words on the Monday test he is excused 
entirely from the Tuesday study period. All of those successful on 
Wednesday are excused from the Thursday period. Keener found in 
his study that about twelve per cent were excused from both periods. 





1A Source of Confusion in Spelling. Journal of Educational Research, Jan., 
1929, pp. 47-56. 

2The Evaluation of Two Methods of Teaching Spelling. Year Book of the 
National Society of College Teachers of Education, 1926. 
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In the classes sampled in the present study about ten per cent were 
excused from both periods and about eighteen per cent from the second 
period. - 

The saving of pupils’ efforts is by no means fully represented by 
the number of children who know how to spell all of the words in a 
week’s assignment in advance of instruction. Nearly all of the pupils 
know some of the words. In Table II the scores on the initial test 
represent the average per cent of words to be taught during a ten- 
week period which are known in advance of study. These represent 
the percentage which they would not need to study under the Pre- 
study plan. They are approximately as follows: Grade II, thirty-five 
per cent; Grade III, fifty-five per cent; Grade IV, sixty-five per cent; 
Grade V, sixty-nine per cent; Grade VI, seventy-one per cent; Grade 
VII, seventy-five per cent, and Grade VIII, sixty-eight per cent. 
The average of these percentages is 62.7. Thus the average pupil in 
his entire spelling course knows how to spell nearly two-thirds of the 
words in the spelling list a month or more! before he is asked to study 
them. The Test-study plan requires three reviews of the two-thirds 
of the words a pupil knows—in the Monday, Wednesday, and Friday 
tests—and conserves all the rest of the time for mastery of the one- 
third of the words which he does not know. Theoretically, the Test- 
study plan is undoubtedly well conceived to make the most of the 
pupil’s time. It should be realized, moreover, that a test may be an 
effective means of learning. Indeed, in studies of memorizing nonsense 
syllables and other materials a combination in which a self-test (like 
recall) predominates over mere rereading, proved to be markedly 
superior to mere review study.’ 

From the theoretical point of view, the surprising thing—to the 
writer at least—is the fact that the Pre-test plan does not show to 
greater advantage than it does in this study and others. While it is 
neither a perfect nor a fool-proof plan, its disadvantages seem fewer 
and less serious than those of the Pre-study program. In the writer’s 
opinion most of the limitations of the Pre-test plan previously men- 
tioned are not very serious ones, and most of these may be removed. 
The most conspicuous deficiency of the plan, according to observations 
of the writer and of others, is to be found in weaknesses in the pupil’s 





1 Since the lists used in these tests covered ten weeks of advance assignments, 
the average would be about five weeks. 


2 Gates, A. I.: Recitation as a Factor in Memorizing. Archives of Psychology, 
1917. 
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method of study and of management of his own work and the inade- 
quacy of the teacher’s supervision of pupil’s individual work. The 
writer’s observations were that in the Study-test plan the pupils 
were held to the use of better techniques of learning, to better distribu- 
tion of time on different words, to more adequate check-up of results. 
In the Test-study plan pupils were given less adequate guidance, 
and often supervision was superficial. As a result, pupils frequently 
dawdled or more commonly utilized poor methods of study, failed 
to do their assignment properly, giving undue time to certain words 
and insufficient time to others, failed to check their work properly, 
and otherwise relaxed into inferior study. This was notably true of the 
least experienced and least intelligent pupils, who were precisely the 
ones among whom the method showed relatively the poorest results. 
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THE NATURE OF INTELLIGENCE* 


J. H. WILSON 


INTRODUCTORY 


In the discussion of the nature of ‘‘ general intelligence” and of the 
possibility of testing it, an important suggestion has lately been made 
by Thorndike. He had put forward a list of eight tests, based 
apparently on his own theory, and has implied that they will not 
satisfy the well-known criteria deduced by Spearman and his collabo- 
rators for demonstrating the existence of ‘‘general intelligence” 
as a “central factor.” 

The issue is of much theoretical value and certainly seems impor- 
tant enough to justify putting it to the test. A small research has 
therefore been planned with this object in view, and the results are 
described in the following pages. 


THE CONDITIONS GOVERNING THE EXPERIMENT 


Thorndike’s list of tests consists of the following well-known 
processes: Memory for digits, pitch discrimination, opposites, defining 
words, completing sentences, arithmetical problems, number series, 
and completing pictures. 

He proposes that tests of this kind be given to 10,000 sixteen- 
year-olds, the accuracy being such as to secure reliability coefficients 
of 95. Spearman accepts the tests, but adds that the subjects must 
be of the same sex, that they should have received reasonably similar 
education, and that all responses to one and the same test must be 
uniformly marked by the same person. 

As put forward here the proposal is obviously beyond the power 
of any one investigator. Spearman suggests, however, a somewhat 
less rigorous procedure which makes it possible to test his prediction. 
First he maintains that such high reliability coefficients are not 
indispensable. Secondly he suggests that a number of independent 
workers might each examine about one hundred pupils. 





* This study has been carried out under the auspices of the Brighton and Hove 
Higher Education Council. It composed part of a thesis submitted for the Ph.D. 
degree of London University. The author is greatly indebted for valuable 
assistance to Professor C. Burt and also to the teachers and students who helped 
in the examination. 
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Tue Tests, SUBJECTS AND ADMINISTRATION 


In carrying out the present investigation, use was made of published 
material wherever possible. Thus memory for digits was tested as 
described in Whipple’s ‘Manual of Mental and Physical Tests”’ 
under “auditory” and “visual” memory respectively. Pitch dis- 
crimination was readily measured by Seashore’s apparatus and 
procedure. The verbal tests and the arithmetical ones gave more 
difficulty. Items were collected from published scales; but the best 
selection for pupils of sixteen years had to be discovered by special 
experiment. They were finally chosen after trial on the students 
of a training college for teachers. A test of completing pictures was 
to hand in that used in the American Army tests, namely ‘“‘A Day 
in the Life of a Schoolboy.” 

To avoid any influence there may be in the day of examination 
the tests were prepared in duplicate and given on two days, and to 
avoid any constant factor entering the results the children changed 
seats from time to time throughout the examination. 

The tests were given to some seventy-odd boys ranging in age 
from fifteen and one-half to sixteen and one-half, the average being 
exactly sixteen years and one month. The boys were taken from 
parallel classes of the same form of a grammar school, the form being 
the one preparing for the General School Certificate Examination. 
In consequence of these limitations they formed a highly selected group. 

Each test was uniformly marked by the same person. In certain 
cases this marking was done by Miss D. King of Brighton, in others 
by the writer, who takes this opportunity of thanking her for such 
valuable help. Sixty of the pupils had taken both examinations and 
their results were used. 


AVERAGE Scores, STANDARD DEVIATIONS AND RELIABILITIES 


These values are given in Table I. 

The first point to be considered is the value of each reliability 
coefficient. The majority of the coefficients lie between .50 and 
.70. In comparison with .95, the figure suggested by Thorndike, 
these results are disappointing. Considering, however, how highly 
selected the group is in age and educational attainments, they compare 
favorably with those of other workers in allied fields. * 





* Working with students of university rank, Hazlett obtained values which 
ranged from .56 down to .29. The writer with younger pupils, using equivalent 
tests from National A and Terman Group Test of Mental Ability, .37; from Otis 
and National A, .64, and Terman and Otis, .72. 
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The table also includes the multiples of the present length of each 
test required to give a reliability equal to that demanded by Thorndike. 

Table II gives the intercorrelations of the tests worked out by the 
usual product-moment formula. These, it will be noticed, are small, 
for the majority lie between .40 and 0. In large measure this was to 
be expected on account of the great homogeneity of the examinees. 
The negative values are not significant. 


TaBLeE I.—Averaces, STANDARD DEVIATIONS, AND RELIABILITIES 






































Standard , 
Test Average Eiutatian . -_ 
nai | ie 
Num | orm 1| F F F ue | relian 
Name ae orm orm 2)| Form 1 orm 2 bility .95 
Memory for digits........... la 7.30 6.52 . 862 .940 .51 18 
(a) Auditory; (6) visual...... 1b 14.38 | 13.10 1.480 3.050 65 10 
Pitch discrimination......... 2 37.23 | 34.29 6.718 8.570 .78 5 
ES ia eines u,b 900 we oe 3 12.98 | 13.52 3.852 3.420 57 14 
Supplying words............ 4 16.82 | 12.45 4.890 3.610 . 50 19 
Defining words.............. 5 11.50 | 10.28 2.550 3.330 .53 17 
Number series............... 6 12.75 4.93 3.330 3.120 .62 12 
Arithmetic problems......... 7 10.80 7.90 3.670 3.050 .73 7 
Picture completion.......... ‘ 79.52 | 45.27 | 13.700 | 15.100 .73 | 7 
TaBLE II.—INTERCORRELATIONS 
Test la 1b 2 3 4 5 6 7 8 








.357|.206) .209) .368 
. 378} .002| .017| .232 





Memory...| Memory 1 aed .211). 
Memory 1b]. 355 he . 162}. 














Pitch .211/—.162) ™ — .086].295|— .070| .004 

Verbal Opposites .454] .276| .113 .618|.398| .371| .363 
Sr Supplying words .259| .184)—.038). .430).315) .222) .124 
Defining words .357| _.378|— .086]. MN |.261] .328| .327 





Number series 
Arithmetic problems 





.206; .002) .295).398) .315) .261 .564) .125 
.209] .017)—.070}.371| .222| .328).564] .154 
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SPEARMAN’S CRITERIA AND PREDICTIONS 
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The criterion is as follows: Take any four tests and let the four 
selected be denoted by the letters a, b, p, and gq. Then, if the coefii- 
cients of correlation are determined solely by a central factor, 


F = Taplbq ar Topl ag = 0 
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to the degree that should be expected from the sampling “ probable 
errors” involved. The quantity F is termed the “tetrad difference.” 


APPLICATION OF THESE CRITERIA 


There are three hundred seventy-eight different values of F. 
Each of these values has been computed, and the distribution of the 
results is shown in Table III. From this table it is readily seen that 
many values differ from zero. And, at once there arises the question 
whether such differences are significant. 

To answer this question the probable error of the quantity F must 
be known. This has been found by Spearman and Holzinger.* 


TaBLe III.—Tetrap Dirrerences (N = 3nC, = 378) 


RanGe FREQUENCY 
.000—.021! 85 
.021-.063 148 
.063—.105 83 
.105-—. 147 34 
.147-.189 21 
.189—.231 6 
.231-.273 1 


1 The distribution has been made in this way to facilitate the construction of 
Fig. 1. This curve is then symmetrical. 


To compare the magnitude of each “tetrad difference” with 
its probable error, the expedient has been adopted of dividing the 





* Where 
FP =fy3°Te.g — Ta.8° 71.4 
the probable error of F is 


1 
.6745| {rts + r2og + 293 + 721g — 2(ria* Tis * Tea + Tie * Tia T20 
1 
+ regs is’ Tig + Taa* Tee's) + 4ris ree? P23 * Tia} + nat (l — r?,3)9(1 — 
rid*(1 = rn) ( —r8e)*} |, 


The difference between this and the true value is made up of two expressions 
involving the fourth orders of the standard deviations of the correlation coefficients. 
These are negligibly small except for small values of the correlation or small 
numbers of pupils and may be calculated by means of the formula of Filon and 
Pearson (Phil. Trans. Royal Soc., London, A CXCI, p. 262). Here, however, they 
are negligible. 

Even so, the formula remains very cumbersome and it has been customary to 
use in its place certain approximations. This procedure will not be followed here 
because one of the main difficulties experienced in the theory of two factors has 
been the use of substitute criteria for the true values. On the contrary, the full 
formula has been used. 
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former by the latter. These quotients are summarized in Table IV. 
In simple sampling a value for the quotient just exceeding 1.00 is 
as likely to occur as not: a value just exceeding 2.00 is likely to occur 
about eighteen times in one hundred trials: a value just exceeding 3.00 
is likely to occur about one in twenty-three: and for a value just exceed- 
ing four, one in one hundred forty-three trials. Until a quotient exceed 
four there can be no great confidence that it is likely to be “‘significant.”’ 
A value of five is usually required in statistical work. A value of 
three is, however, considered “suggestive.” 

In Table IV twenty-two of the quotients are greater than three 
and one is greater than five. These large values constitute a small 
proportion of the total, and it is reasonable to ask whether they could 
be expected on the basis of sampling errors. 


TaBLE [V.—VALUES OF QUOTIENTS PE 


RANGE OF 
QUOTIENT FREQUENCY 
0-% 80 
-1 70 
1-14 66 
1-2 55 
2-2% 50 
2%-3 35 
3-314 15 
34-4 5 
4-414 1 
416-5 0 
5-5% 0 
Greater 1 


To decide this question two expedients are available. 

In compliance with the first of these devices the values of the 
quotients have been distributed as is shown in Table V. Column 1 
gives the range of the values of the quotients, while Column 2 gives 
the frequency of their occurrence. In Column 4 the frequency which 
is to be expected, were the distribution normal, is given. Column 5 
gives the difference between Columns 2 and 4, and these differences 
are large. 

The probable errors of the differences in Column 5 are readily 
calculated when the cases entering into the distribution are independent 
observations. They are not independent in the case of “tetrad differ- 
ences,” for many of these contain the same r’s, and the different r’s 
themselves have intercorrelated sampling errors. Values of probable 
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errors calculated on the assumption of the independence of the observa- 
tions will therefore be too small. They will give, however, a first 
approximation. Such values are given in the third column of the table, 
and in the last are to be found the quotients obtained by dividing 


TABLE V.—SHOWING VARIATION FROM NORMALITY OF TABLE VI 





Probable 

















mee Frequency | ; Diff. 
Quot. = PE Frequency pica expected ‘eeparsg Quot. = PE 
0-1 150 6.5 189 —39 —6.0 
0-2 271 5.9 310 —39 —6.6 
0-3 356 3.1 362 6 2.0 
0-4 ae 2 ae 875 | 1 1.0 
0-5 377 0.7 378 1 1.4 
0-6 378 0 378 | 0 








the value in Column 5 by that in Column 3. Examination of these 
values makes it highly probable that the distribution of the ‘tetrad 
differences”’ is not normal. 

Application of the second device supports this conclusion. By 
means of Table III the histogram shown in Fig. 1 has been constructed. 
In Spearman’s ‘‘ Mental Abilities of Man” (Appendix, p. xi) is given 
the procedure that is to be followed in order to construct the normal 
curve to be expected were the differences due to sampling errors alone. 
This curve is shown in Fig. 1, and it is evident there are variations from 
what is to be expected by sampling errors. 

What, then, is the cause of these perturbations? 

Before seeking the causes it is necessary to consider the question 
at issue. To demonstrate the presence of a “central factor” the 
absence of ‘‘tetrad differences” in significant amounts was sought. 
Their presence demonstrates that there is correlation over and beyond 
that due to the “‘central factor.” On the unifocal theory of Spearman 
this additional correlation, termed ‘‘specific correlation,’”’ can only 
be due to two or more of the mental performances here tested having 
in common the same “specific” factor. 

Further, according to this view, such ‘‘group”’ factors (as they are 
called) have very narrow incidence. 

Two methods may be employed in studying this question. First 
those values of F which are large in comparison with their appropriate 
probable errors may be considered. Examination of these “tetrad 
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differences” will help to indicate those tests which have “group 
factors.”’ There are twenty-two values greater than three times the 
probable errors involved, and eighteen of these, including all greater 
than four times the probable error, may readily be accounted for by 
too large values of the coefficients rga, 745, 735, Tia, ANd 767. In particu- 
lar the coefficient of correlation between the memory tests occurs 
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in four, those between the verbal tests in five, and that between the 
arithmetical tests in nine, including the greatest value. As but one of 
these values reaches the standard of five times the probable error, it 
is impossible to do more than say there is a suggestion of the presence 
of a group factor among the verbal tests, slightly greater suggestion of 
one between the memory tests, and by far the most evidence in favor 
of a group factor between the arithmetical tests. 

Recourse is now had to the second method, in which the amounts 
of “specific correlation” are computed. Such computation is effected 
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by the use of partial correlation.* To apply the method to a given 
pair of tests necessitates finding the correlation of each test with the 
central factor. The most reliable way of obtaining each of these two 
coefficients is to use all available pairs of tests, but all coefficients of 
correlation introduced must obey the tetrad equation.f Hence in 
obtaining the coefficients of correlation of each test with the central 
factort (see Table VI) no use has been made of the coefficients for the 
memory tests, nor of those for the arithmetical ones, nor of those for 
the verbal ones. The values, it will be noticed, are in the main 
small, the third test (opposites) being the only exception. 


TaBLeE VI.—CoEFFICIENTS OF SATURATION 


Test CorrriciENT 
la .576 
1b . 268 
.040 
.910 
447 
.675 
.468 
. 296 
. 387 


CONS Oe OO 


The coefficients of correlation between the specific factors are 
summarized in Table VII. In the same table are to be found the 
quotients obtained by dividing each coefficient by its appropriate 
probable error. There are few large values for these quotients. 
Values greater than three occur in the case of the two memory tests, 
also in the case of defining words and the visual memory tests, again 
in that of supplying words and opposites tests, in that of arithmetical 
problems and opposites and finally in the case of the two arithmetical 





* Write ra for the correlation between two tests a and b and rgz, Sb for the 
correlation between the factor specific to a and that specific to b, and there ensues 
by the formula of Yule for partial correlation, the equation 


Tob — Taglbg 
a i — a1 = ee 

where g denotes the factor common to both a and b. For the correlation between 
the specific factors will be that found when g is held constant. These values have 
been found for all the possible pairs of tests used. 

Tt Let ra, be the coefficient required, then 
Tabac + Tabfad + +++ + Tazfay +-:° 

Tbe + Tod +++*++fey tee: 








Tr a0 = 


a, b, c, etc., being the tests. 
t These coefficients are technically termed ‘‘saturation coefficients.”’ 
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tests. The only significant value is that obtained for the two arith- 


metical tests. 


A further investigation may now be attempted. Tetrad differences 
may be selected in the following way: 


TaBLE VII.—CogEFFICIENTS OF CORRELATION BETWEEN SPEciFic Factors 









































Test la | 16 2 3 4 5 | G 7 8 
| 
fn 
EN sss cauécwartent la a .359] .230/—.206) .003)—.053/—.089) .050 .192 
4.7 18:0 12.4 .0 .6 1.0 5 2.3 
PATS 7’ 1b 380) \ —.180) .080) .074) .277;—.145| — .067 .144 
4.7 2.1 9 9 3.4 1.7 8 RS 
Pitch 2 . 230) — .180 . 186) — .063|— .153} .230)—.086| —.012 
2.9 2.1 2.2 8 1.8 2.8 1.0 .l e 
Opposites.............| 3 |—.206) .080) .186 .323} .013)—.076|) .258 .029 
2.4 .9 2.2 4.1 Py 9 3.1 .3 
Supplying words....... 4 .003| .074|—.063| .323 .194) .134| .105| —.059 
.0 9 8 4.1 2.3 1.5 1.2 P| 
Defining words........|"5 |—.053| .277|—.153) .013) .194 — .084| .182 097 * 
6 | 3.4 1.8 a 2.3 9 | 2.2 1.1 
Number series......... 6 |—.089)—.145) .230|—.076) .134)—.084 . 504 .068 
10-137: iat t 2 1341 28 Pel 7.7 8 
Arithmetical problems.| 7 .050) — .067)— .086; .258) .105) .182) . .044 
5 8 11.0 | 3.1 | 1.2 | 2.2 | 7.7 en 5 
ees tag 8 -192| .144;—.012) .029)—.059| .097| .068) .04 
2.3 ee | 3 7 1.1 8 5 ie 





Take any four tests, including only one at a time from the verbal 
tests, one only from the arithmetical tests, and one only of the memory 
tests. The coefficients are determined solely by a ‘central factor,”’ 
and the tetrad equation ought to be satisfied to the degree to be 
expected from the sampling errors. 


TasLeE VIII.—Tetrap DiFrreRENCES PREDICTED AS SATISFYING CRITERIA 
(N = 120) 


RANGE 


.000—.015 
.015-—.045 
.045-—.075 
.075—. 105 
.105—.135 
.135—. 165 


1 The distribution has been made in this way to facilitate the construction of 
Fig. 2. This curve thus becomes symmetrical. 


FREQUENCY 
23} 
39 
35 
13 
8 
2 


This may be done by constructing a series of tables similar to 
Table II, in each of which is placed a selection of four of the tests. 
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In this way forty such tables may be constructed, and from each table 
there will ensue three values of the quantity F, making a grand total 
of one hundred twenty. These values are distributed in Table VIII 


TaBLeE [X.—QUOTIENTS = pM or F 


PE 
QUOTIENT = - FREQUENCY 

re rns ater ee hr ees oe an he ee ere Pate ee 47 a 
NS TE ee” ed Serena eee 49 f i 
i 6 a og nein nin do 6 40 Sew hd be ae oA 22 i 
OS I ee er eer eee eee eee 2 3 
Ee Te GR ee ee 0 ie 
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in a form suitable for the formation of the histogram given in Fig. 2. 
In the same figure is drawn the normal curve to be expected were the 
differences due to sampling errors. Table IX and Table X have been 


modeled on lines similar to those employed in Table IV and Table V 
respectively. 
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Both of the results demonstrate that the data satisfy the criterion 
laid down by Spearman. The value of each of the “tetrad differences”’ 
considered (and here there are as many as one hundred twenty) 
approximates towards zero within the limits of the probable errors 
involved. 


TABLE X.—SHOWING NORMALITY OF DISTRIBUTION IN TABLE IX 





v 





| 
otient = : Value of 

” F Frequency Probable | Frequency Differences Diff. 

PE error expected : PE 

0-1 47 3.60 60 13 3.7 

0-2 96 2.96 98 2 0.7 

0-3 ‘ 118 95 115 3 3.2 

0-4 120 0 120 0 




















TaBLE XA.—Grovup Factors 




















Correlation 
Number between Probable ' 
vou of test specific error Quotient 
factors 
t,o eh eb wkae la and 1b .359 .076 4.7 
3 and 4 .323 .078 4.1 
ME a eile tis pie 3 and 5 .013 .087 0.1 
4and 5 .194 .084 2.3 
EES gas a 6 and 7 | . 504 .065 | 7.7 





A SrmmiLtarR EXPERIMENT WITH GIRLS 


To corroborate these results and to see whether there were any 
sex differences the experiment was repeated with girls. The tests, 
with the exception of the ‘‘ visual memory” for digits (Test 1b), were 
therefore given to fifty girls in the parallel forms of a high school— 
z.e., to the forms preparing for the School Certificate Examination. 
The average age of these girls was slightly in excess of sixteen. The 
ages fell within a somewhat narrower range than did those of the boys. 

To save space, only three tables are presented. In the first of 
these, Table XI, are to be found the intercorrelations; in the second, 
Table XII, the “coefficients of saturation” with the ‘central factor’’; 
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and in the third, Table XIII, the coefficients of correlation of the 
“‘snecific factors.” 

Examination of these tables shows very close agreement with the 
results already obtained with the boys. 

The values of the tetrad differences (of which there are two hundred 
ten) showed perturbations from what was to be expected by sampling 
errors alone. Analyzing the results exactly as was done in the first 


TaBLE XI.—INTERCORRELATIONS 
































N = 50 
la 2 3 4 5 6 7 8 
la ee 324 | .462 | .289 | .156 | .030 149 214 
2 | .304 | ~~ 938 | .258 | .113 | .282 | .167 | .024 
3 | .462 | .238 [| —~_]| .525 | .225 | .146 | .164 | .163 
4 | .289 | .258 | .525 [-~_]| .567 | .287 | .369 | .488 
5 | .156 | .113 | .225 | .567 [~_ | .328 | .343 | .543 
6.|,.0m | 28 |...) 287 | 2 | 8.| 
7 | .149 | .167 | .164 | .369 | .343 | .528-~~_ 358 
g | .214 | .024 | .1683 | .488 | .543 | .1387 | .358™~ 
— 





experiment led to a similar explanation, which is most clearly seen 
from Table XIII. There are values of the coefficients of correlation 
between the specific factors greater than three, and one greater than 
five, times the probable errors involved. 

Both experiments agree in demonstrating the presence of a “‘group 
factor” in the arithmetical tests. 


TaBLeE XII.—CoeEFFIcIENTS OF SATURATION 
Test CoErFFICIENT 
la 444 
. 365 
.550 
. 838 
.614 
.355 
. 523 
.466 


The “suggestive” values (those greater than three differ in the 
two experiments. In the second experiment these occur in the case 
of “auditory’’ memory and opposites, and in that of defining words 
and picture completion. In the first experiment they occurred with 
“‘visual”? memory and defining words and with arithmetical problems 
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TaBLE XIII.—Sprciric CORRELATIONS 
































N =50 
la 2 3 4 5 6 7 8 
ta | Lt 1094 | 4.291 | —.169 | —.165 | —.152 | —.108 | +.008 
2 | +.194 | ~——~+047 | — ‘004 | —.151 | +.174 | —.030 | —.177 
3 | +.201 | +.047 eid +.140 | ~.171 | —.062 | —.173 | —.125 
4 | —.169| —.094 | +.140 +.120 | —.019 | —.148 | +.200 
G4 208) ~~. 388 1 40) 1 4.149 | +.032 | +.368 
6 | —.152| +.174|] —.062 | —.019 | +.149 +.429 | —.033 
7 | —.108 | —.030 | —.173 | —.148 | +.082 a +.150 
g | +.008| —.177| —.125 | +.200 | +.368 | —.033 | +.1 id 





and opposites. The values are only “suggestive,” and among so 
many values (thirty-six in the first experiment and twenty-eight in the 
second) can most reasonably be accounted for by sampling errors. 
It is interesting to note, however, that specific correlation has been 
manifested for boys and not for girls in at least one experiment,' 
though it must be added that in it the tests involved different mental 
processes. 


SuMMARY OF THE RESULTS 


1. The results obtained with the tests used can readily be explained 
on the basis of a “central factor’”’ running through all the performances 
and a number of other factors specific to each performance, provided 
that the two specific factors belonging respectively to the arithmetical 
tests overlap. 

2. There is, then, conclusive evidence of a “‘group”’ factor in the 
arithmetical tests, and this result agrees with those of Rogers? and 
Collar. Such group factors are of great importance both theoretically 
and practically. They are tacitly assumed in all work on “special 
abilities’? such as “‘ practical ability,’ “musical ability,”’ and the like. 
The demonstration of their presence has been made in but few cases. 
The reasons for this lack of experimental evidence are nowhere more 
succinctly put than by Professor C. Burt.* He writes, “‘Over specific 
inborn abilities I need not linger. For them effective tests have proved 
disconcertingly hard to contrive. Simple correlation is here inappli- 
cable. General intelligence is always getting in the way. We think 
we have tested something specific. We find we have only hit upon 
another test of intelligence. Its ubiquitous influence can only be 
eliminated by some elaborate technical device, the procedure, for 
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example, known as multiple correlation; and the complexity of the 
whole task bewilders where it does not baffle. * 


Nor do these special abilities, although presumably inborn, declare themselves 
at so young an age as the more general. Specialization during the first twelve 
years of childhood is the exception rather than the rule . . . the young child 
contains in fresh and dormant essence the germ of every faculty. Age alone 
betrays our idiosyncrasies. Adolescence is preeminently the period when many 
of these localized talents and specialized interests seem for the first time to mature. 
Accordingly, efforts at vocational guidance and educational specialization must not 
be forced at too early a stage. (Italics mine.) At present, for example, the system 
of junior county scholarships tends to sweep all our brightest children at the age 
of ten or eleven into secondary schools of a somewhat academic type. When at a 
later period examinations are held for trade schools, most of the best instances of 
special talent are missing: they have already been creamed off and drafted into 
other directions less suited to their powers. 


Among the cases where “group” factors do become of appreciable 
magnitude the five most important have been in respect of what may 
be called the logical, the mechanical, the psychological, the arith- 
metical, and the musical abilities. In each of these a group factor 
has been discovered of sufficient breadth and degree to possess serious 
practical consequences—educational, industrial, and vocational. 

3. Group factors are evidently absent even where they might 
otherwise have been expected. There is, for example, no demon- 
strated group factor in the verbal tests nor one in the memory tests. 

Particularly evident is the lack of a group factor between the 
opposites test and that of defining words, for both tests appear to 
depend largely upon understanding the meaning of words. Yet in 
neither of the two experiments is there any appreciable specific 
correlation. The amount of correlation between the specific factors 
in the memory tests is high but not significant. A similarly high value 
was obtained by Carey, in which case it was significant. 

4. The investigation serves to throw light upon a constituent of 
the specific factors of an obvious kind and yet not unlikely to possess 
great importance. It consists in the manner of presentation of the 
task to the pupils. In the case of the verbal and arithmetical tests 
the presentation was made by writing. The presentation was oral 
in the case of the tests of “auditory memory” and pitch discrimination 
and was both visual and oral in those of ‘‘ visual memory” and “picture 
completion.”” The evidence derived from this investigation would 





* These words were written before the publication of the probable error of the 
“tetrad difference.”’ 
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point to there being no specific correlation in the mode of presentation. 
The choice between oral and written tests, then, would seem to intro- 
duce no group factor of appreciable magnitude. 

5. The “coefficients of saturation” differ for the different tests 
and also in the two experiments. For the boys, pitch discrimination 
contains but very little of the ‘‘central factor,’”’ while the amount 
for girls is appreciably higher. A similarly well-marked difference 
occurs in the opposites test, in the supplying words test, and in the 
number series test. As to the test which gives the best measure of the 
“central factor,” this obviously is the opposites test in the case of 
the boys and the supplying words test in the case of the girls. 

6. To be effective in the evaluation of amount of the “central 
factor” present in any particular testee’s performance, a coefficient 
of saturation in the neighborhood of .995 is needed. None of the tests 
used in this investigation even approaches this perfection. 
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ON PARTIAL CORRELATION VS. PARTIAL 
REGRESSION FOR OBTAINING THE 
MULTIPLE REGRESSION EQUATIONS 


HAROLD D. GRIFFIN 


Eureka Springs, Arkansas 


Tue REGRESSION EQUATION IN EDUCATIONAL PsyYCcHOLOGY 


In the physical sciences, where exact experimental conditions are 
comparatively easy to maintain, it is possible to predict with con- 
siderable confidence the effect of a given set of causes under like 
conditions. In educational psychology, as well as in many other 
fields, we are unable to isolate factors with such ease. It has therefore 
been necessary to develop a technique by which one may draw from 
a mass of data, often conflicting, some conclusion which will represent 
the most probable relation between variables. This is the correlation 
technique. 

Correlating two variables, however, is but the beginning of serious 
study. The educational psychologist seeks means of prediction and 
control. For example, no sooner were we measuring intelligence 
than we were correlating intelligence with school marks, obtaining 
regression coefficients, and attempting to forecast school success. 
But as intelligence tests correlate only from 0.40 to 0.65' or so with 
school marks, educators began to use the multiple regression procedures 
to improve their predictions. Thus, May? sought a better prediction 
of college success by including time spent in study as a third variable. 
This produced a multiple correlation coefficient (Ro.12) of 0.824, 
which was considerably better than the simple correlation (r9;) 
of intelligence and college success, which was 0.60. 


DIFFICULTIES IN PRESENT PROCEDURES 


Multiple regression equations might be employed to a much greater 
extent were it not for the difficulties, both real and fancied, of the 


1 A correlation coefficient of 0.40 has but eight per cent forecasting efficiency, and 
one of 0.65 has but twenty-five per cent. See Hull, C. L.: The Correlation Coeffi- 
cient and Its Prognostic Significance. Journal of Educational Research, Vol. XV, 
1927, pp. 327-338; also Wallace and Snedecor: ‘‘ Correlation and Machine Calcula- 
tion.” Official Publication, Vol. XXIII, No. 35, Iowa State College of Agricul- 
ture, 1925, p. 17. 

? May, M. A.: Predicting Academic Success. Journal of Educational Psy- 
chology, Vol. XIV, 1923, pp. 429-440. 
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procedures now in vogue. There are two general methods of obtaining 
the multiple regression equation, both seeming to have originated 
with Yule. One is the partial correlation method, the other is the 
partial regression method. The former is at best tedious and inade- 
quately checked but, given time and patience, by it one can handle 
almost any number of variables. The latter solves by means of simul- 
taneous linear equations, but the technique that commonly has been 
employed, determinants, is very difficult to handle with more than 
four variables. For a three or four variable problem, the partial 
regression method with any type of solution is much swifter than 
partial correlation methods. Fortunately, there are methods for 
solving simultaneous linear equations that are superior to determinants 
and are capable of handling efficiently any number of variables. 
One such method, the Doolittle, has long been used by the United 
States Coast and Geodetic Survey and by some civil engineers. The 
purpose of the present paper is to bring this method to the attention 
of a larger circle of educational psychologists than at present employ it. 


HistoricaL DEVELOPMENT 


The possibilities for prediction in the regression equation have been 
known only to the present generation, and the very concepts of 
correlation and regression are not much older. An historical survey 
of the development of these concepts may reveal why procedures in 
obtaining the multiple regression equation and also efficient methods 
for solution in intermediary steps still lack standardization. 


BRAVAIS AND His REeputTepD DISCOVERIES 


August Bravais, a French geologist and mathematician, was once 
given credit by Pearson for devising (between the years 1838 and 1846) 
formulas for solving correlation and intercorrelation between two and 
three variables. But earlier ideas with regard to Bravais’s place 
in the history of correlation have undergone modification. Pearson 
himself now holds that Bravais, while working with two and three 
variables in geodetic work, developed relationships between his prod- 
uct-sums and Gauss’s mean-errors (our standard deviations) which, 
had these relationships been developed, would have led to symbols 





1 Bravais, A.: Analyse Mathématique. Sur les Probabilités des Erreurs de 
Situation d’un Point. Memoires Académiques de la Royale Scientifique Institute de 
France; Science, Mathématique et Physique, Vol. IX, 1846, pp. 255-332. 

Pearson, K.: Regression, Heredity, and Panmixia. Phil. Trans. of the Royal 
Society Ser. A., 187, 1896, pp. 253-318. (See especially pp. 261, 287.) 
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equivalent to our correlation coefficient. It is doubtful, however, 


whether Bravais was even thinking of correlations between his observed 
quantities. 


GALTON AND THE CONCEPT ‘“‘r”’ 


During the middle of the 1870’s, Francis Galton was seeking a 
numerical measure for “reversion.”” At a lecture delivered at the 
Royal Institution of Great Britain, February 9, 1877, he presented 
a symbol for such a measure, and symbolized it ‘‘r.’”’ Furthermore, 
he presented it in an equation, c?,; = v?/(1 — r?), which, of course, 
may be written r= +/1 — v?/c?;. In these formulas, v = the 
variability of a family of sweet peas, and c,; = the variability of the 
general population of sweet peas. That Galton was clear in his 
mathematical reasoning may be seen by comparing this formula with 
Mill’s formula for the measure of correlation, r = +/1 — S?,/o?,,? 
or with the correlation ratio in its original form, nyz = 1/1 — o2,/o7,- 
Galton’s lecture was published in Nature, Vol. XV, for 1877, where it 
may be consulted.* 

About ten years later Galton developed an empirical method for 
determining the correlation between two variables. He made a 
distribution chart and drew a line across it in such a way as to touch 
as near the mean of as many rows as possible. He then measured 
the angle of the deviation of this line of best fit from the vertical. 
The tangent of this angle gave him his index of reversion, or “regres- 
sion” as he was now terming it. Tangents were used because they 
swing from +1.00 through 0 to — 1.00, thus forming a very convenient 
measure of varying degrees of deviation from perfect positive to perfect 
negative. For some time this r was called Galton’s function, due to 


the use of this term by Weldon who applied correlation in his measure- 
ment of various sea life.‘ 














1 Pearson, K.: Notes on the History of Correlation. Biometrika, Vol. XIII, 
1920, pp. 25-45. (See especially pp. 28-32.) 


Darmois, G.: ‘‘ Statistique Mathématique,”’ first edition. Octave Doin, Paris, 

1928. See p. 246ff. 

2 Mills, F. C.: ‘“‘Statistical Methods Applied to Economics and Business,’’ first 
edition. Henry Holt and Co., 1924, pp. 437, 442. 

3 Galton, F.: Typical Laws of Heredity. Nature, Vol. XV, 1877, pp. 492-495, 
512-514, 532-533. (See especially pp. 532-533.) 

‘Weldon, W. F. R.: Certain Correlated Variations in Crangon Vulgaris. 
Proceedings of the Royal Society, Vol. LI, 1892, pp. 2-21. 

Weldon, W. F. R.: On Certain Correlated Variations in Carcinus Moenas. 
Proceedings of the Royal Society, Vol. LIV, 1893, pp. 318-329. 
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EDGEWORTH AND THE TERM ‘COEFFICIENT OF CORRELATION”’ 


F. Y. Edgeworth (1892) dealt with Galton’s function for three 
variables, and indicated how the method might be applied up to six 
variables, making the assumption of normal distribution. He used 
the phrase ‘‘ coefficient of correlation” instead of reversion, regression, 
or Galton’s function, and the term persisted.! 


PEARSON AND THE PRODUCT-MOMENT 7r 


By 1896 Pearson (Footnote 1, p. 36) had introduced the product-sum 
method, enabling the coefficient of correlation to be calculated without 
the use of an isopleth or thread. Since then we commonly speak of the 
Pearson product-moment r. Many interesting variations of the 
Pearson product-moment formula have arisen since its first statement. 
Symonds has listed some fifty or more variants.” 


YuLe’s Two MeEtTHOops IN MULTIPLE CORRELATION 


Partial Regression Method.—G. U. Yule, then an assistant of 
Pearson, gave a discussion of the product-sum methods in their 
applications to correlation in two papers published during 1897. 
In his paper on the theory of correlation Yule obtained partial regres- 
sion coefficients in a three variable problem by solving an observation 
equation set up by the method of least squares, using determinants. 
Merriman, whose ‘‘Textbook on Method of Least Squares’ (sixth 
edition) Yule used for setting up his equation, suggests the Gauss 
direct method of solution (the basis of the Doolittle method) on pages 
51-65 of the edition used by Yule, but the hold of the determinant 
method of solution seems so strong on British mathematicians that 
neither Yule nor his immediate followers profited by Merriman’s 
suggestions. We thus find that our newly revived partial regression 
method was really the earlier method for obtaining relations between 


1 Edgeworth, F. Y.: On Correlated Averages. Philosophical Magazine, Fifth 
Series, Vol. XXXIV, 1892, pp. 190-204. 

?Symonds, P. M.: Variations of the Product-Moment (Pearson’s) Coefficient 
of Correlation. Journal of Educational Psychology, Vol. XVII, 1926, pp. 458-469. 

*Yule, G. U.: On the Significance of Bravais’ Formulae for Regression, Etc., 
in the Case of Skew Correlation. Proceedings of the Royal Society, Vol. LX, 1897, 
pp. 477-489. 

Yule, G. U.: On the Theory of Correlation. Journal of the Royal Statistical 

Society, Vol. LX, 1897, pp. 812-854. 

‘ Merriman, M.: “‘Textbook on Method of Least Squares.” New York: John 
Wiley and Sons. (Many editions, the eighth is of 1911.) 
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three or more variables. Failure to appreciate the use that could 
be made of the coefficient of multiple correlation prevented the 
development of the partial regression method for some years. As 
late as 1906 Yule could write, ‘‘ No practical use has, we believe, been 
made of [the coefficient of multiple correlation] . . . but it appears 
to have considerable importance, and may be indicative of the closeness 
of the causal connection between one variable and the joint influence of 
two other variables of which the first is a function.” 

Partial Correlation Method.—It was this pursuit of causal connec- 
tions that diverted the partial regression method. Yule had been 
attracted to the possibilities in partial correlation for determining 
causal relationships. Consequently he developed a method for 
solving partial r’s that leads indirectly toward the coefficient of multi- 
ple correlation, but is very laborious. As yet he had neither named 
nor symbolized the regression coefficient. Yule was a careful investi- 
gator and a painstaking reporter; therefore his writings carried the 
weight of authority, and the methods and terminology that he preferred 
became the accepted standards for other statistical workers. In 1907 
Yule presented the system of notation substantially as used today.? 
It was at this time that b was introduced as the symbol for the regres- 
sion coefficient, and that a system of subscripts was developed to 
represent dependent and independent variables. Yule first published 
his “Introduction to the Theory of Statistics”* in 1910-1911. This 
book has been exceedingly popular and has run into many editions. 
It has served to codify Yule’s methods and techniques, so that statisti- 
cians who have been interested primarily in obtaining regression 
equations, yet only incidentally in partial correlation, have patiently 
developed their equations by his partial correlation technique.‘ 


SYSTEMATIZING PARTIAL CORRELATION METHODS 


Truman L. Kelley systematized Yule’s partial correlation procedure 
in 1914 so that one needed to find, for example, but seventy-eight 





1 Hooker, R. H. and Yule, G. U.: Note on Estimating the Relative Influence of 
Two Variables upon a Third. Journal of the Royal Statistical Society, Vol. LXIX, 
1906, pp. 197-200. 

Yule, G. U.: On the Theory of Correlation for Any Number of Variables 
Treated by a New System of Notation. Proceedings of the Royal Society (Series A) 
Vol. LX XIX, 1907, pp. 182-193. 

*Yule, G. U.: ‘Introduction to the Theory of Statistics,’’ seventh edition, 
revised. Charles Griffin and Co., London, 1924. 

‘ Yule: Ibid., pp. 225-248. 
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partial r’s and one multiple r in. a six-variable problem if one were 
seeking the regression equation.' Yule’s complete method would 
require two hundred forty partial r’s for a six-variable problem. 
In 1917 Curt Rosenow made a further analysis,? reducing the number 
of partial r’s to forty-five in a six-variable problem, but requiring four 
incidental multiple r’s in the process. C. L. Huffaker’s schema* 
requires forty-six partials, but only two multiples—one of these merely 
serves as a check on the other. The schema for three-, four-, and five- 
variable problems employed by Henry Garrett‘ resembles Huffaker’s. 
J. E. Bathurst has extended Huffaker’s schema to include seven and 
eight variables.’ Yule’s complete method requires three hundred 
fifteen partials for a seven-variable problem, and five hundred eighty- 
eight for an eight-variable. Bathurst requires eighty-three and one 
hundred twenty-nine respectively. In 1916 Kelley contributed a 
cleverly worked out set of tables to assist in solving partial correla- 
tions. The first edition of this was soon exhausted, however, and, 
for various reasons, it has never been reissued. 


RETURN TO PARTIAL REGRESSION METHODS 


In the herculean task of gathering the data and publishing the 
results of the Army psychological examinations a more direct 
method was required for finding the regression equations. Brown and 
May, at the suggestion of Karl Pearson and Raymond Pearl, used 
simultaneous linear equations with solution by determinants,’ as 
Yule had done nearly a quarter century earlier. Kelley then turned 
his attention to methods for shortening the determinant solution of 
the simultaneous linear equations. In 1921 he published a nomo- 





1 Kelley, T. L.: ‘‘Educational Guidance,” (first edition), Teachers College, 
Columbia University Contributions to Education, No. 71, 1914. 

2 Rosenow, C.: ‘“‘The Analysis of Mental Functions.” Psychological Mono- 
graphs, Vol. XXIV, No. 5, 1917. 

* Huffaker, C. L.: A Contribution to the Technique of Partial Correlation. 
Journal of Applied Psychology, Vol. VII, 1923, pp. 135-142. 

‘Garrett, H. E.: ‘Statistics in Psychology and Education,’ first edition. 
Longmans, Green and Co., 1926, pp. 223-231, 240-251. 

5 Bathurst, J. E.: A Partial Correlation Schema. Journal of Applied Psy- 
chology, Vol. XI, 1927, pp. 155-164. 

® Kelley, T. L.: Tables: “To Facilitate the Calculation of Partial Coefficients of 
Correlation and Regression Equations.” Bulletin No. 27, University of Texas, 1916. 

7 Yerkes, R. M.: “Psychological Examining in the United States Army.” 
Memoirs of the National Academy of Sciences, Vol. XV, Government Printing 
Office, 1921. 
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gram,' for facilitating computation of the partial regression coeffi- 
cients. That same year Clark L. Hull published a description of a 
nomographic method for solving partial correlations,” but his alinement 
board was suggested by the partial correlation technique and Kelley’s 
“Tables” of 1916 rather than by the partial regression method. In 
1923 E. R. Wood prepared nomograms for solving the formulas 
for partial correlation coefficients and the formulas for partial regres- 
sion coefficients. Wood’s charts constitute by far the best graphic 
solutions for these two formulas yet proposed, but unfortunately they 
have not yet been published commercially. Symonds, following 
Kelley, made a job-analysis of determinant solutions in three- and 
four-variable problems, and published charts based on this method.‘ 


But solution by determinants proves troublesome beyond four 
variables. °® 





1 Kelley, T. L.: “Chart to Facilitate the Calculation of Partial Coefficients of 
Correlation and Regression Equations,” first edition. School of Education, 
Special Monograph, No. 1, Stanford University Publications, 1921. 

Kelley, T. L.: ‘‘ Alignment Chart of Correlation Functions.”’ Stanford Uni- 
versity Publications, 1921. 

See also, Kelley, T. L.: “Statistical Method,” first ed. The Macmillan Co., 
1923, pp. 291-295 and inside back cover. 

? Hull, C. L.: A Device for Determining Coefficients of Partial Correlation. 
Psychological Review, Vol. XXVIII, 1921, pp. 377-383. 

* Wood, E. R.: “‘A Chart for Obtaining Partial Correlations and Regression 
Equations of Three or More Variables.” To be issued by the University of 
Chicago Press. 

‘Symonds, P. M.: Job-analysis Sheet for Computing Partial and Multiple 
Coefficients of Correlation and Regression Coefficients. Teachers College Record, 
Vol. XXVIII, 1925, pp. 52-69. 

Symonds, P. M.: ‘Partial and Multiple Correlation Chart. Three Vari- 
ables.”” Teachers College, Columbia University, Bureau of Publications. 

Symonds, P. M.: “‘ Partial and Multiple Correlation Chart. Four Variables.” 
Not now listed for sale. 

5 For methods of solution by determinants see, Whittaker, E. M. and Robinson, 
G.: “‘The Calculus of Observations,’”’ second edition. Blackie and Son, London, 
1926, pp. 71-77, 231-234. The first reference is to Chio’s method of solution. 

Deming, H. G.: A Systematic Method for the Solution of Simultaneous 
Linear Equations. American Mathematical Monthly, Vol. XX XV, 1928, pp. 360- 
363. An orderly application of Chio’s method. 

Hanus, P. H.: ‘‘Elementary Treatise on The Theory of Determinants,” first 
edition. Ginn and Co., 1886. Largely follows the methods and practice of Muir. 

Salmon, G.: ‘‘Lessons Introductory to the Modern Higher Algebra,” fourth 
edition. Hodges, Figgis and Co., Dublin, 1885. Fourth edition reprint, G. E. 
Stechert, New York, 1924. A thorough introduction to determinants following 
the methods and practices of A. Cayley and J. J. Sylvester. 
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ITERATION METHODS OF SOLUTION 


There are, however, other ways by which simultaneous linear equa- 
tions may be solved. Carr! lists some six or seven direct algebraic 
methods. There are also certain indirect methods of reaching the 
results by iteration, or successive approximations.? Recently, 
Kelley and Salisbury have advanced another such method’® which 
they claim will reduce the labor of computation in a sixteen-variable 
problem at least ninety-five per cent over determinant methods. 
Tolley and Ezekiel have shown, however, that the Kelley-Salisbury 
iteration method in solving a six-variable problem for partial regression 
and multiple correlation coefficients is far inferior to the Doolittle 
method.* Kelley grants the superiority of the Doolittle method 
‘‘for a small number of variables—perhaps up to 10,’ and recommends 
its use; but he insists that the recent improvements on his iteration 
method make it the swifter instrument where a great number of vari- 
ables are involved.® 


THE DoouiTTLeE METHOD 


The Doolittle method is a variation of Gauss’s direct method 
of solving simultaneous linear equations by substitution. Gauss 





1 Carr, G. S.: ““A Synopsis of Elementary Results in Pure Mathematics.’ 
Francis Hodgson, London, 1886, pp. 42ff. 

? Kelley, T. L.: “Statistical Method.”’ Pp. 302-310. 

Whittaker and Robison: Op. cit., pp. 255ff. 

Edgeworth, F. Y.: A New Method of Reducing Observations Relating to 
Several Quantities. Philosophical Magazine Ser. 5, Vol. XXIV, pp. 222-223, 
466-479; Vol. XXV, 1888, pp. 184-191. 

Ford, L. R.: The Solution of Equations by the Method of Successive Approxi- 
mations. American Mathematical Monthly, Vol. XXXII, 1925, pp. 272-287. 

’ Kelley, T. L. and Salisbury, F. 8.: Iteration Method for Determining Multi- 
ple Correlation Constants. American Statistical Association, Vol. XXI, 1926, pp. 
282-292. 

Salisbury, F. S.: ‘‘A Simplified Method of Computing Multiple Correlation 
Constants. Journal of Educational Psychology, Vol. XX, 1929, pp. 44-52. An 
improvement on the iteration method explained in the preceding article. 

4 Tolley, H. R. and Ezekiel, M. M. B.: The Doolittle Method for Solving Multi- 
ple Correlation Equations vs. the Kelley-Salisbury Iteration Method. American 
Statistical Association, Vol. XXII, 1927, pp. 497-500. 

5 Kelley, T. L. and McNemar, Q.: Doolittle vs. the Kelley-Salisbury Iteration 
Method for Computing Multiple Regression Coefficients. American Statistical 
Association, Vol. XX1V, 1929, pp. 164-169. 





y Aa te OL 


1 = ee OS = CO. DD et TTP es © 


— 


é... 


} 


ao @ 


—_ es © 





Partial Correlation vs. Partial Regression 43 


developed several methods of solution—some direct, others indirect.' 
M. H. Doolittle of the United States Coast and Geodetic Survey made 
various improvements on Gauss’s method of direct substitution, and 
in 1878 published an account of his method.? Doolittle’s method 
replaced Schott’s version of Gauss’s approximation method in the work 
of the coast and geodetic survey,’ and found favor among engineers.‘ 
In 1923, Howard R. Tolley and Mordecai M. B. Ezekiel of the United 
States Bureau of Agricultural Economics introduced this method for 
the solution of the partial regression coefficients (Kelley’s 6’s) into 
statistical practice,’ in which the writers adopted the novel method 
of using the mean product sum, po: = 2o:/N, etc., in the simultaneous 
linear equations, instead of the zero-order cosfiicients as is the general 
practice. Mills adopted the method as there presented for his text 
on economic and business statistics published the following year.® 
Hull also uses the entire method as outlined by Tolley and Ezekiel in 
his 1928 book, “Aptitude Testing.”’? In 1925 Wallace and Snedecor 





1 For some of Gauss’s methods see, Merriman: Op. cit., eighth edition. Pp. 
181-187. 

Whittaker and Robinson: Op. cit., pp. 234-236, 257-258. 

Encke, J. F.: Ueber die Methode der Kleinsten Quadrate. Astronomische 
Jahrbuch, Berlin, 1835, pp. 267-272; 1836, p. 265. Gauss’s direct process for 
solution by elimination. 

Jacobi, K.: ‘‘Astronomische Nachrichten,’’ Altona, No. 523, 1845, p. 297. 
Gauss’s method of successive approximations. 

Schott, C. A.: Solution of Normal Equations by Indirect Elimination. 
Coast Survey Report, 1855, pp. 255-264. Gauss’s indirect process of elimination by 
successive trials and approximations revised and systematized for use in the coast 
survey. This method was largely used there for wane years until replaced 
by the Doolittle method. 

? Doolittle, M. H.: Method Employed in the Solution of Normal Equations 
and the Adjustment of a Triangulation. Coast and Geodetic Survey Report, 1879, 
pp. 115-120. 

See, Wright, T. W. and Hayford, J. F.: ‘‘Adjustment of Observations by 
Methods of Least Squares,” second edition. D. Van Nostrand Co., New York, 
1906, preface. 

Also, Adams, O. S.: Application of the Theory of Least Squares. Special 
Publication No. 28. Coast and Geodetic Survey, 1915. 

‘ Leland, O. M.: ‘‘ Practical Least Squares,’’ first edition. McGraw-Hill Book 
Co., New York, 1921. 

5 Tolley, H. R. and Ezekiel, M. M. B.: A Method of Handling Multiple Corre- 
lation Problems. American Statistical Association, Vol. XVIII, 1923, pp. 993- 
1003. 

® Mills: Op. cit., pp. 491ff., 576-581. 

7 Hull, C. L.: “Aptitude Testing,” first edition. World Book Co., 1928. 
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used the method, but with zero-order coefficients instead of mean pro- 
duct sums, in their little manual for agricultural research workers. 
Garrett, 1928, saw that zero-order coefficients could be substituted for 
the mean product sums, but he evidently was unfamiliar with Wallace 
and Snedecor’s study, and he also seems to have failed to investigate 
Tolley and Ezekiel’s reference to the Doolittle method of solution.’ 
The writer would recommend Wallace and Snedecor’s study to educa- 
tional psychologists as the clearest and most adequate presentation 
of the Doolittle method now available. 


CONCLUSION 


Thus we have seen that the methods for obtaining the regression 
equations and the coefficient of multiple correlation were hampered 
for many years by inadequate and cumbersome methods of solution. 
Now that a synthesis of the most economical statistical method for 
obtaining the regression equations, the partial regression method, has 
been effected with the most economical engineering method for 
solving simultaneous linear equations, the Doolitte method, we may 
expect that the multiple correlation and prediction technique will be 
employed to a much greater extent in educational psychology than in 
the past. 





1 Wallace, H. A. and Snedecor, G. W.: ‘‘ Correlation and Machine Calculation.”’ 
Official publication, Vol. XXIII, No. 35, Iowa State College of Agriculture, 1925. 

2 Garrett, H. E.: A Modification of Tolley and Ezekiel’s Method of Handling 
Multiple Correlation Problems. Journal of Educational Psychology, Vol. XIX, 
1928, pp. 45--49. 
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THE SHRINKAGE OF THE COEFFICIENT OF 
MULTIPLE CORRELATION! 


SELMER C. LARSON 


Carleton College, Northfield, Minnesota 


INTRODUCTION 


It has been recognized by theoretical statisticians for some time 
that when the coefficient of multiple correlation (R) is derived for a 
given set of data, its value is likely to be deceptively large. If the 
computations have been correct, the value will hold rigidly for the set 
of data from which the regression equation was derived. If, however, 
the equation should be applied to a second set of data, even though 
strictly comparable, it has been supposed that the yield in this latter 
case would, except for errors due to sampling, be less than in the first. 
Moreover, it has been supposed that the more variables contained 
in the regression equation, the greater this shrinkage will be. This is 
particularly significant because ordinarily the practical employment 
of a regression equation involves its use with data other than those 
from which it was derived. If this shrinkage should turn out to be 
very large, the building of multiple regression equations might well be 
abandoned. The matter is therefore one of considerable importance, 
both theoretically and practically. Several attempts have been made 
by statisticians to derive a formula which should indicate the amount 
of this shrinkage. The most promising one of these will be considered 
in the present paper. So far as the writer has been able to discover, 
no one has attempted to determine experimentally the actual amount 
of shrinkage. The present report describes such an attempt in the 
field of psychological testing. 

A study of the shrinkage is made by using a regression equation 
derived from one group of subjects to predict the criterion scores of a 
second group. The correlation yield by this procedure is subtracted 
from the yield obtained by predicting the criterion scores of the second 
group by means of a regression equation derived from themselves. 
This shrinkage is studied with the number of the independent variables 
in the regression equation ranging from one to ten in number and for a 
variety of different criteria. A comparison is then made between 





1 From the Psychological Laboratory, University of Wisconsin. The writer is 
greatly indebted to Professor M. V. O’Shea for permission to use data selected from 
the results obtained by the Mississippi Survey. 
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such empirical findings and the results obtained by applying a recently 
proposed formula for determining the same type of shrinkage. 


SouRCE AND SELECTION OF DATA 


Something like 30,000 pupils were given mental and achievement 
tests in a survey of the school system of the State of Mississippi. 
For the present study, the test scores of eight hundred high school 
pupils from this number were used. The scores of pupils from the large 
and medium-sized high schools were chosen in the belief that the level 
of instruction would be more nearly uniform.? The entire population 
of eight hundred consisted of four groups—two hundred boys in each 
of two groups and two hundred girls in each of the two remaining 
groups. The subjects to make up these groups were chosen from those 
tested by the survey in such a way that each of the four contained 
exactly the same number of individuals drawn from any particular 
class of each school sampled. Otherwise the placing of the subjects 
in the several groups was entirely at random. By making up the 
personnel of the groups in this manner it was felt that they would be 
as exactly comparable in regard to general level and range of natural 
endowment, culture, and educational ooportunities as possible. 

Each pupil had eighteen scores entered after his name. The 
designations are as follows: 


X, English Xo Logical selections (Terman) 


X: Mathematics Xi: Arithmetic (Terman) 

X; Science X12 Sentence meaning (Terman) 
X, History X13 Analogies (Terman) 

X; Chronological age X14 Mixed sentences (Terman) 
X« Intelligence quotient X15 Classifications (Terman) 

X; Information (Terman) Xie Number series (Terman) 
Xs Best answer (Terman) Xi; Total Terman 

X, Word meaning (Terman) X13 Total lowa 


The first four X’s together with X,s are scores made on the Iowa 
High School Content Examination. X,;to Xj; are scores made on the 
Terman Group Test of Mental Ability. Another column—the sum 
of each row—was added for checking purposes. 





1 Ezekiel, M. J. B.: An unpublished paper read before the Mathematical Society 
at its annual meeting in Chicago in December, 1928. 

2 O’Shea’s study showed that for the state as a whole scholastic achievement in 
the small high schools was decidedly lower than in the larger ones. 
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EMPIRICAL DETERMINATION OF SHRINKAGE 


Part I 


Ten distinct regression equations with English as the criterion 
were derived from the data for the first group of boys. Ten parallel 
regression equations were derived from the data for the second group 
of boys. In the case of the first group, the first equation had all 
ten independent variables (tests). The second equation had the best 
nine test variables, z.e., those having the highest criterion correlations. 
The third had the best eight test variables, and so on down to the tenth 
equation which was based on the single test having the highest criterion 
correlation. ‘The same procedure was followed with the second group 
of boys, except that in this case the same test variables were used in 
the corresponding equations as were used with the first group. Owing 
to a natural variability in the size of the zero order correlation coeffi- 
cients from sample to sample, the tests successively excluded from the 
progressively smaller equations with the second group of boys were not 
in all cases the next in order of weakness in the criterion r’s. 

Space is lacking for the presentation either of the means and 
standard deviations needed for the derivation of the regression equa- 


TaBLE I.—ZeEROo OrDER COEFFICIENTS OF CORRELATION FOR Bots Sets or Boys 


The Bold Face Figures at the Upper Right Are the Coefficients of Correlation 
for Boys, Set No. 11; and the Light Face Figures at the Lower Left Are the Coeffi- 
cients for Boys, Set No. I. At the Top and Left of Table Are Indicated the Vari- 
ables Whose Notations Are Outlined Earlier in the Text. This Table Shows How 
English (X,) Correlates with Each of the Items in the Terman Test and Also the 
Intercorrelations between the Various Items. 





















































Z1 z7 ze ze z10 Zi Zi2 Z13 zu Zs Z16 17 
z: | 1.000) .680 asa .740| .681| .367| .5602, .456, .418| .489 .204 .718 
a7 .699| 1.000] .714 .697| .631) .425| .565, .464) .451| .516) .264 .812 
zs 499) .617) 1.000, 608, .617| 447, 022, 488, .489| 442, 349) 786 
zs (704) 621) .469 1.000, .587| .348, .587 446, .527 .581| 247) .837 
ze | .524 .612, .498| .580| 1.000) .384, .466) .460, .391) .484) .286) .737 
zi . 366) .346, .272) .392)' .382) 1.000 280) -406, .210| .366| .416 .612 
zi .506| .477| .414) .567| .407|  .382| 1.000) .297| .892| .856) .204| 688 
za .503| .515| .492| .497) .451| .438| .421) 1.000] .345, .362) .450! .653 
ru .499| .525) .408) .502| .427| .327| .500| .386| 1.000) .483) .254| .634 
zs .474| 516} .400 .505| .470! .335| .390, .499| .359! 1.000] .288| .6s9 
zie | .255| .316/ .334 .350| .424, .553| .342) .534| .280| .418| 1.000] .64s 
zi 714 .765| .670| .812| .730) .643 Saas .728| .664| .656| .659| 1.000 
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tions or for the regression equations themselves.' In order that the 
interested reader may study the relationship between the original 
correlations and the several multiple correlation yields, there has been 
placed in Table I the entire series of zero order correlation coefficients 
for both groups of boys. The coefficients of the respective groups are 
distinguished by means of contrasting type faces. All of the coeffi- 
cients are positive. 

The multiple correlation coefficients derived from the results shown 
in Table I are givenin Table II. The R’s corresponding to the several 
multiple regression equations derived from the boys of Group I are 
shown in row A and those for Group II in row D. These values are 
the correlation coefficients which would have been obtained if in 
each case the test scores from which the regression equation was 
derived had been substituted appropriately in the equation itself and 
the resulting criterion estimates had been correlated with the true 
criterion in the ordinary way. Actually, these values were obtained 
by means of the usual formula which is decidedly simpler. The 
coefficients in both series are distinctly high, as aptitude correlation 
yields run. It is noteworthy, however, that in both series alike, after 
three tests have been included in the equation, the addition of all the 
remaining seven tests suffices to raise the correlation yield a total of 
barely a single point in the second decimal place. 

The next step in the process was to substitute the actual test 
scores of the boys of Group I in the equations derived from Group II 
and to correlate the resulting criterion estimates with the true criterion 
scores of Group I. The resulting series of coefficients is given in Table 
II, row B. The procedure was then reversed. The true criterion 
of Group II was correlated with the criterion estimates obtained by 
substituting their relevant test scores in the equations derived from 
the scores of Group I. The resulting coefficients are given in row E. 
We now have the values from which shrinkages may be determined. 

According to the a priori expectation as indicated above, the values 
in row B should show a perceptible shrinkage when compared with the 
values in row A and similarly with row Z when compared with row D. 
A brief comparison of the R values in the two pairs of rows shows that, 
except for the equation containing but a single test variable, this 
expectation is realized. The amount of the shrinkage is shown in 





1 These are given in detail for the entire study in the author’s dissertation filed 
in the library of the University of Wisconsin. It is entitled ‘‘Studies in Aptitude 
Forecasting with the Multiple Regression Equation.’ 
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row C for the several pairs of values of Group I, and for Group II 
in row F, The mean values for rows C and F are given in row G. 
A glance at these latter values shows at once that, despite a certain 
amount of variability presumably due to sampling errors, there is a 
clear tendency for the shrinkage to increase with the increase in the 
number of independent (test) variables in the regression equation. 
This again is in harmony with what has been believed by statistical 
theorists. A graphic representation of the mean shrinkage values 
shown in row G is presented as the solid line in Fig. 1. 
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We have seen that the increase in the size of R, even when regres- 
sion equations are applied to the same data from which they are 
derived, is extremely slight and grows less and less as the number of 
test variables increases. We have also seen that under the same 
condition the amount of shrinkage in yield from such equations when 
in actual use grows greater and greater. The question naturally 
arises whether there may not be a point beyond which the increase in 
R resulting from the addition of a new test variable may not be more 
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than offset by the increase in shrinkage, so that the true functional 
value of such a test battery or other estimating aggregate may not be 
actually less than if the test or other independent variable had not 
been added. An examination of rows B and E shows that in the 
case of both groups alike such a critical point is reached at the eighth 
test added. In both cases the batteries would have given an absolutely 
higher forecasting yield with two of the tests left out. The moral of this is 
that under some circumstances the inclusion of certain tests in an 
aptitude battery after it has reached some size may not only entail 
the waste of energy and materials of administering the test, but may 
actually reduce the yield, and this even when the best possible method 
of weighting the tests is employed. 


Part Il 


To secure further empirical evidence as to the amount of shrinkage 
from the ordinary multiple correlation coefficient, further computations 


TaBLe IIi.—SHowiING THE SHRINKAGE OF R’s WHEN THE Same (10) Test 
VARIABLES ARE USED BUT DIFFERENT ACADEMIC SuBJECTS ARE 
EMPLOYED AS CRITERIA WITH DIFFERENT CORRELATION YIELDS 





| .,|Mathe-|,. , Total 
| ees antidin Science | History Salis 


} 

Boys’ Group II. | | | 

Correlation yield (R) from 
equations derived from the 
subjects’ own scores......... A | .7869 | .6481 | .5689 | .7719 | .8200 
Correlation yield (R) from 
equations derived from the 
scores of Group I........... B | .7786 | .6148 | .5230 | .7505 | .8098 
NS oo oe i ieee C | .0083 | .0283 | .0459 | .0214 | .0102 


Girls’ Group II. 

Correlation yield (R) from | 

equations derived from the 
subjects’ own scores......... D | .7989 | .6403 | .4548 | .7115 | .7875 








Correlation yield (R) from 
equations derived from the 
scores of Group I........... E | .7665 | .6226 | .4219 | .6786 | .7755 

ESE EE ere F | .0324 | .0177 | .0329 | .0329 | .0120 




















were undertaken in all of which the number of test variables was kept 
constant. The number chosen was the maximum for this study— 
ten. The multiple regression equations on mathematics, science, 
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history, etc., for the boys of Group I were used to estimate the corre- 
sponding true criterion scores of Group II. The same procedure was 
followed for the girls. 

The results are given in Table III which is constructed in a manner 
comparable to Table II above. With the number of test variables 
constant, this table enables us to observe the influence upon the amount 
of shrinkage of the strength of the natural tendency to correlation in 
the test data involved. If we divide the ten shrinkages found in this 
series into two groups on the basis of the size of the original R’s, 
we find that the average shrinkage for the five largest R’s is .0169 
whereas that for the five smallest R’s is .0315. The tendency for the 
weaker sets of data to yield the larger shrinkages is evident. For the 
two lowest values (Science) this amounts to .0394, a very appreciable 
amount. 


THE SmitH SHRINKAGE-DEDUCTION FORMULA 


A promising correction formula has been developed to apply 
to the coefficient of multiple correlation. A paper containing the 
formula was read by M. J. B. Ezekiel at the December 1928 meeting 
of the American Mathematical Society held at Chicago. This 
formula is 





where R = the estimated correlation obtaining in the universe 
R = the observed correlation 
m = the number of variables, dependent and independent 
n = the number of observations (statistical population) 


Ezekiel gives the credit for developing this formula to B. B. Smith. 

At the completion of the computations described in the preceding 
section it was a relatively simple task to substitute in the above formula 
and determine for the various observed R’s, the corresponding esti- 
mates of the “correlation obtaining in the universe.’”’ These values 
are given in Table IV for the R’s obtained in Part I. Table V shows 
the corresponding values of the R’s obtained in Part II. Subtraction 
from the original R’s shown in Tables II and III respectively yields 
the amounts of shrinkage which would be anticipated by the formula 
in each case. 

In Table IV row E shows the mean amount of shrinkage for each 
pair of observed R values for the several numbers of test variables. 
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It is plotted in Fig. 1 for the purpose of easy comparison with the 
empirically determined shrinkages. It is clear that the formula 
definitely parallels the empirical findings. It conforms to the observed 
tendency for the amount of shrinkage to increase with the number of 
variables involved in the equation. There is a well-marked tendency, 
however, for the Smith formula to indicate materially larger shrinkages 
than the empirical results show. 

Passing to Table.V, we may make the comparison with the empirical 
results by treating the shrinkages the same as those in Table III. 


TaBLE V.—SHOWING THE SHRINKAGE OF R’s as INDICATED BY THE SMITH 
FoRMULA, WHERE THE NUMBER OF VARIABLES IS CONSTANT BUT THE 
S1zE oF THE R’s Vary RATHER WIDELY 


a 2 
| English i © Science | History| a 
| | matics | | _ Iowa 











Boys’ Group II. | 
Correlation obtaining in the | | 
universe (R) as estimated by | 
the Smith formula.......... C | .7727 | .6156 | -.5330 | .7563 | .8094 
Shrinkage obtained by sub- 
tracting row A above from 
row A in Table III......... B | .0142 | .0275 | .0359 | .0156 | .0106 
Girls’ Group IT. 
Correlation obtaining in the 
universe (R) as estimated by 
the Smith formula.......... C | .7843 | .6132 | .4005 | .6916 | .7730 
Shrinkage obtained by sub- 
tracing row C above from | 
row Din Table III......... D | .0146 | .0271 | .0543 | .0199 | .0145 
| 























Computation shows that the mean shrinkage of the five largest R’s 
is .0139 whereas that for the five smallest is .0329. Here again we 
observe, as in Table IV, that the shrinkages yielded by the formula 
are materially larger than those found empirically. An analysis of the 
formula reveals that the shrinkage increases as the size of the obtained 
R decreases. The formula will break down, however, when m/n> R? 
as the values will then become imaginary. In this situation, with m 
equal to 11 and n equal to 200, the formula will give imaginary values 
when the absolute values of # are less than .235. 
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SUMMARY AND CONCLUSIONS 


1. The present investigation has shown that the theoretically 
expected shrinkage of R as derived by the multiple correlation formula 
is a fact. 

2. The shrinkage is found to increase as the number of test variables 
increases. 

3. The shrinkage is also found to increase as the size of R decreases. 

4, The Smith shrinkage-deduction formula parallels all of the above 
empirical findings, but quite consistently gives values which are in 
excess of those obtained under the present experimental conditions. 

5. The empirically observed shrinkage increases at such a rate with 
the increase in the number of test variables that one of the most widely 
known scholastic aptitude tests actually shows a lower correlation 
yield with a criterion when ten test units are used than when only 
eight are employed. This suggests that test batteries may have very 
definite limitations as to size. 
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THE INFLUENCE OF BLOOD RELATIONSHIP AND 
COMMON ENVIRONMENT ON MEASURED 
INTELLIGENCE 


VERNER MARTIN SIMS 


University of Alabama | 


One of the favorite modes of attack used in the investigation of the 
relative influence of heredity and environment on intelligence as meas- 
ured by the tests of today has been through a consideration of the 
resemblance of siblings (brothers and sisters) in the trait or traits 
measured by these tests. These studies are numerous! and have 
invariably shown a decided correlation between the intelligence of 
siblings. Interestingly enough, this correlation coefficient is about 
the same as the correlation found between the physical characteristics 
(eye color, cephalic index, stature, etc.) of siblings—approximately 
.50. However, this similarity between the physical and intellectual 
resemblance of siblings does not necessarily mean that the relationship 
is determined by the same causative factors. The physical char- 
acteristics are seemingly uninfluenced by environment, but the siblings 
are certainly subjected to a common environment, and, until evidence 
is produced that this is a negligible factor in the case of intelligence, 
the correlation has no significance. In recent years this fallacy has 
been pointed out time and again. It but remains to attempt the 
determination of the relative contributions of common environment 
and blood relationship to the correlation which is found. 

One of the most satisfactory approaches made to this problem 
that has come to the writer’s attention is contained in the Chicago 
study reported in the 1928 Year-book of the National Society for the 
Study of Education.? In this study the correlation between the 
intelligence of siblings reared in the same home was compared with 
that of siblings who were separated before either child was six years 
of age. Instead of the usual correlation of .50 a correlation of .32 +.05 





1 For careful summaries of most important of these studies see: Burks, Barbara: 
A Summary of Literature on the Determiners of the Intelligence Quotient and the 
Educational Quotient. T'wenty-seventh Yearbook of the National Society for the 
Study of Education, 1928, Part, II, Chap. XVI, pp. 252-261. 

2?Freeman, Holzinger, and Mitchell: The Influence of Environment upon 
Intelligence, School Achievement and Conduct of Foster Children. Twenty-seventh 
Yearbook of the National Society for the Study of Education, 1928, Part I, pp. 128- 
135. 
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(by age pairing) or .25+.06 (by double entry pairing) was found 
between the intelligence of one hundred twenty-five pairs of siblings 
who had been separated for an average of seven years and four months 
and had an average age of five years and four months at the time of 
the separation. This seems to be rather convincing evidence of a 
decided environmental influence. The chief weakness of the study 
lies in the fact that all the environmental influences have not been 
eliminated, since these siblings had for a period of years been subjected 
to a common environment. The more than five years spent in the 
same home probably accounted for a part of the correlation found. 
In fact, present day tendencies in psychological theory would seem 
to indicate that the most significant period of their life had been 
spent in a similar environment. From the data which they had at 
hand it was impossible to determine the extent of this influence. 

The study reported in this paper represents a different attack on the 
same general problem. The procedure used here in the attempt to 
answer the question of the relative influence upon intelligence of blood 
relationship and common environment was to compare the correlation 
between the intelligence of pairs of siblings from the same home with 
the correlation between pairs of unrelated children, the unrelated 
pairs being equated with the sibling pairs on the basis of age, school 
attended, and home background. ‘The significance of the correlation 
between the intelligence of siblings can only be interpreted in the light 
of information as to the correlation that would be found if the members 
of the pairs were unrelated. Presumably children paired at random 
would show no resemblance, but one cannot compare paired siblings 
with random pairs and account for the differences in terms of inherit- 
ance. It is only when environmental influences are equal that com- 
parisons have meaning. In this study the attempt has been made 
to equate two sets of paired children on the basis of environment, the 
members of each pair in one set having the same parents while in the 
other the members of a pair have different parents. To the extent 
that these conditions have been met, the difference in the degree of 
relationship between the members of the first set compared with the 
second set can be considered as a measure of the influence of common 
parentage. 

During the spring of 1927 the writer cooperated in the testing of all 
children found in Grades V to XI in five school systems in Lincoln 
Parish, Louisiana. Four of these schools were eleven-grade con- 
solidated rural schools, the fifth system consisted of two elementary 
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schools and a high school located in a town of approximately 5000 
population. The Otis Self-administering Tests of Mental Ability 
were used to measure intelligence, and the Sims Score Card for Socio- 
economic Status was used as a measure of home background. The 
Otis Intermediate Test, Form A, was used for Grades V to VII, and 
the Higher Test, Form A, for Grades VIII to XI. On the basis of the 
scores made on the Otis Tests the IQ’s were determined by the proce- 
dure recommended by Otis in his “‘Manual of Directions.” The 
reliability of the Otis Tests, as reported by the author, is high (.95 
for the Intermediate Form and .92 for the Higher Form). The 
validity is more difficult to determine, but high correlations reported 
between this test and other tests of intelligence would seem to indicate 
that it is measuring approximately the same thing that all of our 
intelligence tests measure. The reliability and validity of the Sims 
Score Card has been discussed at length elsewhere.' Suffice it to say 
here that it has been shown to adequately differentiate between groups 
with known differences.in home environment; and the reliability, as 
determined by correlating the scores of children from the same home, 
has been found to be rather high, approximately .90. The coefficient 
of reliability determined from one hundred pairs of sixth-, seventh-, 
and eighth-grade children was .95, and the coefficient found by 
correlating the scores of the two hundred three pairs of siblings reported 
below was .87. The relatively low reliability of the data used here 
may be due to the fact that some of the examiners were not very 
skilled in testing, but there is evidence that high school children 
report a slightly higher socio-economic status, perhaps because 
certain of the items included in the scale are actually more frequently 
possessed by older children. This increase is, however, very slight, 
being less than one-third of a point a year, and, although it does 
lower the reliability of the measure, it is still reliable enough for our 
purposes. 

From the tested population described above, each child reporting a 
brother or sister living in the same home and enrolled in one of the 
schools and grades included in the testing (Grades V to XI) was selected 
and his record paired with that of the brother or sister reported. 
These siblings were paired in all possible ways; that is, where three 
siblings were tested they were grouped A with B, A with C, and B 





1 “The Measurement of Socio-economic Status.’’ Public School Publishing 
Co., 1928. 
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with C. More than three siblings were not reported in any case. 
This pairing seems to be the generally accepted one when the number 
of siblings found in any one family is not large. Two hundred twenty- 
four such pairs, coming from one hundred eighty-two homes, were 
secured, but for reasons explained later, twenty-one of these pairs were 
discarded, so that two hundred three pairs from one hundred sixty-one | 
homes composed the main group used in this study. 

The unrelated pairs were prepared by substituting for one member 
of each sibling pair an unrelated child coming from the same school, | 
and having the same age and home background. The procedure for 
preparing one of these unrelated pairs was as follows: Referring to 
the school where the younger sibling was enrolled, that child with the 
same home background score and most nearly the same age as the’ 
younger sibling was selected. When two or more children were found 
to fulfill these requirements to the same extent, one was chosen at 
random from the number. In the same manner the child most nearly 
a duplicate of the older sibling was selected. (The one of these two 
selected children nearer the age of the sibling which he was selected 
to duplicate was taken as one member of the unrelated pair and the 
other sibling as the second member. )In no case was a child selected 
whose age was not within one year of the sibling and whose home 
background score was not within one point of the sibling’s score, or 
the average of the siblings’ scores in cases where there was a difference 
in the reported home background score of two siblings. If no such 
child could be found, the sibling pair was discarded. The twenty-one 
pairs of siblings mentioned above as being eliminated from the study 
were discarded because no unrelated pair could be found to match 
them. 

In this manner a group of two hundred three pairs of unrelated 
children having the same age and home background and coming 
from the same school as the sibling pairs were secured. Within the 
reliability of the measures used they differ from the sibling pairs only 
in the absence of common parentage. Table I presents the average 
age and socio-economic status (with the standard deviation) of the 
two groups, the young members of the pairs being contrasted with the 
old members. In addition, although it is to be noted that they have 
not been paired on these bases, the mean anes and IQ (with the 
standard deviation) is reported. 

That the groups used here are fairly representative of the popula- 
tion from which they were selected is shown by comparing the averages 
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and sigmas for intelligence and socio-economic status of the siblings 
and unrelated pairs with the total population. The total population, 
tested consisted of 1018 cases and the average IQ was 90.9 (with a 
standard deviation of 15.6), while the average socio-economic status 
was 14.7 (with a standard deviation of 5.9). 


TaBLE I.—SHOWING THE MEAN Aag, Socio-Economic Status, GRADE AND IQ oF 
THE SIBLING PAIRS AND THE UNRELATED PAIRS 














Age S-E-S | Grade IQ 
N 
Mean| Sigma | Mean| Sigma || Mean| Sigma | Mean| Sigma 

Young member: | 

LG \ os ave 60 203! 13.0| 2.0! 12.4) 5.4 || 7.2 | 2.1] 91.8) 15.4 

Unrelated.......... 203} 13.0} 1.9 | 12.9) 5.4 || 7.38] 1.7 | 91.3) 14.7 
Old member: 

Pe 203} 15.9} 2.1 | 13.4/ 5.6 || 9.1] 1.8 | 87.3) 13.6 

Unrelated.......... 203; 15.9} 2.2 | 13.5) 5.4 || 9.1} 1.8 | 87.5) 13.3 
Difference between old 

and young: 
ESE ce eee 2.9 1.0 1.9 4.5 
Unrelated.......... 2.9 .6 1.8 3.8 



































Comparison of the sibling and unrelated groups shows their great 
similarity. The differences between the members of the sibling pairs 


‘(young against old) is seen to be practically the same as the differences 


between the members of the unrelated pairs. The average difference 
in age is approximately three years, and in grade approximately two 
grades; the old members score slightly higher on socio-economic 
status; and the younger members have the higher IQ. One is struck 
by the fact that, although no attempt has been made to equate the 
groups on the basis of either grade or intelligence, there appears to be 
just as much similarity between the two groups on these measures as 
on the other two. | 

Certain factors may independently affect the intelligence of mem- 
bers of pairs such as those here used; consequently the degree of 
correlation found between the paired cases is influenced by the method 
used for entry in the correlation table. Because there is yet some 
doubt as to what is the most satisfactory method, in the correlations 
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reported here two methods were used. By the first (known hereafter 
as the age-pairing method) the score of the young member was always 
entered on the vertical axis of the correlation table and the score for 
the corresponding old member on the horizontal axis. By the second 
(known as the double-entry method) each pair of scores was entered 
twice; with the young on the vertical axis and then with the old on the 
vertical axis. ! 

The correlation between the 1Q’s of the siblings and the correla- 


tion between the IQ’s of the unrelated pairs, using the two methods, 
are as follows: 


r (Ace Parrine) r (Dovuste Entry) i 
er hi sai he oc ” 44 + .04 40 + .04 
EE Or a Sue e a aikn-ed a a .35 + .04 .29 + .04 


On the surface, at least, it seems that these correlations, between 
the intelligence of the members of the sibling pairs and between the 
intelligence of the unrelated pairs, should be indications of the rela- 
tive influence of common environment plus common parentage, and of 
common environment only on intelligence as measured by a group test; 
that is, common environment produces a correlation of .35 or .29 
depending upon the method used, while the addition of common 
parentage raises this correlation to .44 or .40 again depending upon 
the method used. The writer is inclined to believe that these are 
the most significant correlations reported, but there are undoubtedly 
certain factors which might influence these coefficients that make it 


undesirable to accept them at face value. It has been pointed out by | 
numerous investigators that there is a decided relationship between | 


age and intelligence, consequently the coefficients for the members 
of each group is being thus affected. For comparative purposes, since 
the age factor is presumably operating the same in both cases, it is 
perhaps of small matter; but it seemed desirable to see what the correla- 
tion would become when age is kept constant. Partial correlations 
with age constant were determined for the sibling pairs and for the 
unrelated pairs. In determining these correlations the age pairing 
method was used. These correlations were, for the sibling pairs .48, 


and for the unrelated pairs .34. In other words, this correction for age | 





1In using these two methods we are following the procedure of Freeman, 
Holzinger, and Mitchell in the study mentioned above. 
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has not seriously affected the relation between the coefficients for 
siblings and for unrelated pairs. 

It will be recalled that two forms of the Otis tests were used for 
testing the intelligence of the subjects. . Otis reports a correlation of 
only .84 between the two forms, consequently IQ’s determined from 
different forms are not exactly comparable. In order to determine the 
influence which the use of two forms might be having on the correla- 
tions found, those pairs where the child substituted to make an unre- 
lated pair was not tested with the same form as was the sibling for 
which he was substituted were first eliminated.! Coefficients of 
correlation were then determined for: (1) those cases where both 
members of the pair were tested with the intermediate form; (2) 


TABLE II.—CoMPARISON OF THE CORRELATION BETWEEN INTELLIGENCE OF 
SIBLINGS WITH UNRELATED WHEN THE SAME Form Was USED 


r (Aes Parrine) r (Dovusie Entry) 

Intermediate form: 

a a aw abn gkaa oom .49 + .06 .49 + .06 

RE AI OP ID. gcc ean cc cevereccces .387 + .07 .32 + .08 
Higher form: 

I i cece ois’ ob .55 + .06 .48 + .07 

ee eo. aac eutee swe ewes .43 + .07 .36 + .08 
Both forms: 

I MEDS cg a's ock'b ds devin ob e's .45 + .07 .41 + .07 

wmmeeneed CN an G6)... ccc ccc cca ce. .31 + .08 .31 + .08 
Composite: 

I in ws oad vi gininia ee ad Wp .51 + .04 .49 + .04 

Cr PME, og cece ebecwees cas .39 + .04 .37 + .04 


those cases where both members of the pair were tested with the 
higher form; (3) those cases where the young member was tested with 
the intermediate form and the old member with the higher form; 
and (4) the composite of all cases. The correlations between the 
intelligence of siblings are compared with those between the unrelated 
pairs in Table II. Considering the composite of these cases it will be 





1 Since an unrelated pair was made by substituting that child in the school who 
had most nearly the same home background and age, but not necessarily the same 
grade, as the sibling, it sometimes happened that the substituted child had been 
tested with a different form from that used on the sibling. 
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noted that although the correlations have been raised slightly there is | 
no serious change in the ratio of the unrelated to the sibling. Compar- 
ing the correlation found when the members of a pair were tested by 
the same form with that found when the members of a pair were 
tested one by one form, and the second by the other, we see 
evidence that the use of two forms is probably reducing the correla- 
tion, but we also see that the siblings and unrelated are reduced to the 
same extent. 

A final factor that needs to be taken into consideration in the 
interpretation of the coefficients that have been found in this study 
is that of selection. What would be the effect on the correlation found 
if we had available all of the brothers and sisters of the siblings used? 
What would be the effect if we had all of the children in the community, 
or even in these schools, who were of an age and from such homes as 
to make them suitable for use in the preparation of our unrelated pairs? 
We have made an attempt to answer these questions. Presumably 
pairs with age differences such as those of the cases used here made 
up at random would show no correlation between intelligence. As a 
check on the validity of the method of pairing used here—in order to 
see if it operated in such a way as to select pairs with like intelligence 
independent of the factors which we have assumed to be causing the 
similarity—a third set of paired cases was prepared. These paired 
cases were made by selecting for each sibling that child, in the same 
school (and in the grades included in the study) having most nearly 
the same age as the sibling. When the nearest age was found to be 
represented by several children, one was selected at random. A 
duplicate for each sibling pair was then secured by taking the selected 
child nearer in age to the sibling he was selected to duplicate as one 
member, and the other sibling as the second member. In this manner 
a set of two hundred three pairs of unrelated children having the same 
age and coming from the same school as the siblings was prepared. 
Since they have been paired without regard to either blood relationship 
or like home background, the correlation should be zero. The correla- 
tion actually found, by age and by double entry-pairing respectively, 
was .05 + .05, and .04 + .05. It appears that whatever selection is 
operating is very slight, and it is presumably a the siblings 
and the unrelated in the same way. 

To summarize, then, siblings reared in the same home have been 
found to have a correlation of approximately .45 between their intelli- 
gence quotients as determined by a group test, while unrelated pairs 
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equated with the sibling pairs on the basis of age and home background, 
and attending the same school as the siblings, have shown a correlation 
between their IQ’s of something in the neighborhood of .30. 

Attention should be called to the fact that the crudeness of our 
measure of home background is operating in such a way as to increase 
the difference between these two correlations. The siblings have been 
reared in the same home, and to the extent that this means common 
environment we are sure they have it. The unrelated pairs have been 
reared in different homes, but homes we have assumed to be alike 
because they had the same score on the score card. To assume that 
two children have the same home background because they have 
the same score on this scale is unreliable to the extent that the measure 
is unreliable, but to compare children equated on this basis with 
children actually reared in the same home has an added factor of 
unreliability caused by the failure of the score card to measure all 
of the finer environmental differences found among homes that are 
seemingly equal. The coefficient for the siblings is not subjected 
to this factor, which is almost sure to attenuate the correlation for 
the unrelated pairs. In other words, if we could find unrelated pairs 
that had actually been reared from birth in the same home we would 
expect a correlation between unrelated pairs that would more nearly 
approach that found for siblings. One feels sure that the influence of 
blood relationship on the intelligence of siblings is not being under- 
estimated when the difference between the coefficient found here for 
the siblings and for the unrelated pairs is taken as its measure. Since 
unrelated children who, within the reliability of our measures, have 
been equated with siblings show the resemblance that we have found, 
it seems safe to say that the common environment to which the 
siblings are subjected accounts for a correlation between their IQ’s 
of at least .30, while blood relationship is potent enough to raise this 
coefficient to .45. 

In conclusion we should call attention to the fact that throughout 
this paper we have been considering intelligence as measured by a 
group test. The fact that high correlations have been found between 
various group tests of intelligence, and between group tests and 
individual tests, would lead one to believe that similar results would 
be found whatever the test used. This, however, does not warrant 
the assumption that there is not some quality in the human, some 
native capacity, which is inherited. The results may be interpreted 
as a condemnation of present day ‘intelligence’ tests or as evidence 
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that intelligence is not solely inherited, but rather a development due 
to the interplay of environmental forces on hereditary characteristics. 
The interpretation that one may give the findings is a personal matter, 
but the indications are that intelligence, so far as we are today able | 
to measure it, is greatly influenced by environment. 
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THE EQUIVALENCE OF JUDGMENTS TO TEST ITEMS 
IN THE SENSE OF THE SPEARMAN-BROWN 
FORMULA! 


H. H. REMMERS 


Purdue University 


THE PROBLEM 


A previous study by two of my students and myself? reported an 
investigation in which a summary of the experimental literature indi- 
cated that ‘“‘the Spearman-Brown prediction formula shows it to give 
meaningful prediction on such materials as mental test items, spelling 
words, [judgments of] lifted weights, true-false items in language, and 
component units of rating scales.’’ This previous work was done by 
a number of different authors—Kelley, Holzinger, Clayton, Ruch, 
Ackerson, Jackson, and Furfey. 

Our own experiment at that time reported results on the Purdue 
Personnel Rating Scale which indicated that summations of ratings on 
ten different traits fell within the limits of allowable error when the 
values of empirically observed correlations for varying numbers of 
raters were compared with those predicted by the Spearman-Brown 
formula. That is, the formula did predict with reasonable accuracy 
the reliabilities to be expected with an increase in the number of judges 
or raters. 

The typical situations in which judgments or ratings are obtained 
is frequently limited by the fact that, assuming the Spearman-Brown 
formula to apply, the number of judges is too small to give ratings 
sufficiently reliable for practical purposes. 

The present study is the report of an investigation of the Purdue 
Rating Scale for Instructors in which the limitation just mentioned is 
largely absent, since the average instructor is likely to have a hundred 
or more students. On this scale students judge instructors anony- 
mously by checking on a graphic scale the amount of ten different 
traits presumably related to success in classroom teaching possessed 





1 A paper given before Section I of the American Association for the Advance- 
ment of Science, Dec. 28, 1929, Des Moines, Ia. 

2 Remmers, H. H., Shock, N. W., and Kelly, E. L.: An Empirical Study of the 
Validity of the Spearman-Brown Prophecy Formula as Applied to the Purdue 
Rating Scale. Journal of Educational Psychology, Vol. XVIII, No. 3, March, 1927, 
pp. 187-195. ; 
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by a given teacher under consideration. In a rating program carried 
out at Purdue University during 1928-1929, something over ninety 
per cent of the faculty were rated. A number of instructors turned 
over to me their marked and scored rating blanks. The question the 
answer to which was sought was, Do the judgments which students 
record concerning their instructors follow the law represented by the 
Spearman-Brown prophecy formula? In other words, is it valid to 
assume that judgments are the equivalent of test items in the sense 
required by the formula? 
The formula, by now rather well known in its application to test 
construction, is 
s wu 
1+ (n — 1)ru 
where r, = the predicted reliability, 
n = the number of times a test is increased by its own length, 
and 
ry = the reliability of a unit length of the given test. 


In the present problem the unit is the judgment of a single student, 
and the problem is to determine the correspondence of reliability 
coefficients determined for given numbers of such judgments with 
those predicted by the formula. 

If the conditions of the formula were always fully met, perfect 
prediction would result. Practically, however, the conditions are 
never ideally met, and our problem becomes a sampling problem sub- 
ject to the laws of probability. 

It should be recalled that the validity of the formula depends upon 
the homogeneity of the test items, or, statistically speaking, upon the 
equality of intercorrelations between units and upon the equality of 
the standard deviations of these units. 





Tn 


PROCEDURE 


In the ratings of the instructors used for this investigation there 
was, so far as I know, no selective factor operating in the return of 
these blanks, except that instructors with unusually low ratings might 
have hesitated to return them as readily as those with more satisfactory 
ratings. To the extent that such a factor did operate it would tend 
to reduce the variability of the total distribution of ratings and thus 
to reduce the reliabilities below what they would otherwise have been. 





1Remmers, H. H.: The College Professor as the.Student Sees Him. Bulletin 
of Purdue University, Studies in Higher Education, Vol. XI, March, 1929, p. 63. 
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From each instructor’s ratings scores were selected according to the 
following schema: 


20 samplings of 1 vs. 1 rating for trait 5, Presentation of Subject-matter 
5 samplings of = 5 ratings vs. = 5 ratings, for trait 5, Presentation of Subject-matter 
5 samplings of 210 ratings vs. 210 ratings, for trait 5, Presentation of Subject-matter 
5 samplings of 215 ratings vs. 215 ratings, for trait 5, Presentation of Subject-matter 
lsampling of 220 ratings vs. 220 ratings, for trait 5, Presentation of Subject-matter 
lsampling of 230 ratings vs. 230 ratings, for trait 5, Presentation of Subject-matter 


Traits 1 and 10, Interest in Subject-matter and Stimulating Intel- 
lectual Curiosity, and also the sums of all traits, were sampled in 
exactly the same way, except that only ten 1 vs. 1 correlations were 
calculated for these, and that in the case of the sums of all traits the 
samplings obtained included only 1 vs. 1, 220 vs. 220, and 230 »s. 230. 

The results of the samplings are shown in Tables I to VII. 











THE Data 
TaBLeE I.—DiIstTrRisvuTions oF CORRELATIONS 1 vs. 1 RATING FOR THE TRAITS 
INDICATED 
| Trait 1 Trait5 | Trait 10 Sums . f all 
| | | traits 
60 to .69 ea 4 | ial 2 
50 to 59 1 3 2 3 
40to .49 1 a) 3 2 
30 to 39 5 3 | 2 1 
20 to 29 ashe 3 1 
10to .19 1 “ee 1 1 
0Oto .09 1 | 1 
—.10 to —.01 1 | 
—.20to —.11 | 1 
Gy stb S's ork stele a ks 10 20 | 10 10 
Ee .290 .429 | .354 .320 
Nh a ies ei aia .344 .450 | .363 .503 
A 37 37 | 37 37 














* N in all tables refers to the number of instructors. 


In Table VII are summarized the necessary data for answering the 
problem. raised at the outset of this study, 7.e., as to whether judgments 
under the defined conditions are equivalent to test items in the sense 
of the Spearman-Brown formula. The answer is that they are. 
There is apparently a slight tendency for the formula to overpredict, 
there being sixteen overpredictions out of a possible seventeen. These 
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apparent overpredictions, however, are probably to be explained by the 
























































reduction in the number of instructors necessitated by the lack of ay 
instructors with very large classes. This tended to reduce the “range bi 
of talent,” with a corresponding reduction in reliability. In no single ine 
TaBLe II.—Correations oF £5 vs. 25 Ratines ror THE Traits INDICATED 1 
Trait 1 Trait 5 Trait 10 te # 
ie 
863 .783 615 ie 
.799 .782 .576 ae 
.752 | .760 433 re 
721 .709 .428 i o 
.705 644 345 et 
ye : di 
Mean _ .728 .736 .479 bak \f 
Median .752 .760 .433 .' 
N 36 36 36 oe 
ee 
os i 
TaBLE III.—CorrewatTions or 210 vs. 10 Ratines ror THE Traits INDICATED if 
aes 
| m" 
Trait 1 Trait 5 Trait 10 
.843 .890 .861 
.832 .856 .781 
.663 .856 . 754 
.652 .843 .661 
.596 . 782 .530 
Mean 717 845 717 nF 
Median .663 .856 .754 e 
N 20 20 20 i 
TaBLE IV.—CorrRELATIONS OF 215 vs. £15 RaTINGsS FOR THE Traits INDICATED x 
Trait 1 | Trait 5 Trait 10 is 
.961 | .973 .902 + 
.882 | .954 .876 i 
.702 | .931 .872 - 
621 .904 .821 sg 
.607 571 .791 ne 
Mean .754 . 887 .852 
Median .702 .931 .872 le 
N 7 7 7 ; 
le 
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TaBLE V.—CoRRELATIONS OF £20 vs. >20 Ratines ror THE TRaITs INDICATED 








Trait 1 Trait 5 Trait 10 Sum of all traits 
.801 .877 .636 .898 
N 13 13 13 40 














TaBLE VI.—CorRRELATIONS OF 230 vs. 530 RaTINnGs FOR THE TRAITS INDICATED 








Trait 1 Trait 5 Trait 10 Sum of all traits 
.936 .876 .805 872 
N 10 10 10 16 














case, however, is the difference between the ‘‘best’’ or most probable of 
the observed correlations and that of the corresponding predicted 
values clearly statistically significant. In the case of only two 
correlations is the difference divided by its probable error greater than 
two. 

On the face of the data, it seems surprising that the summation of 
all traits should give no higher reliability than was observed. This, 


_ however, is probably to be explained by the fact that the raw trait 


scores are incommensurable. Had these scores been first reduced to 
standard measures or percentile equivalents, the resulting reliabilities 
would very probably have been increased very materially. 


SUMMARY AND CONCLUSIONS 


Samplings of judgments for varying numbers of judges selected at 
random yielded reliability correlations within the allowable error when 
the judges are undergraduate students and the things judged are 
classroom personality traits of instructors who are appreciably different 
in that they possess, in the judgments of students, different amounts 
of these traits. The number of correlations upon which this study 
was based is one hundred three. The following generalizations seem 
warranted from these and previously reported data: 

1. Reliabilities are predicted within the allowable error up to 
thirty judgments. - ! 

2. The three traits sampled in this investigation vary significantly 
in reliability. Stimulation of Intellectual Curiosity, for example, 
means more different things to students than does the trait Presenta- 
tion of Subject-matter. 

3. In general, ratings by from ten to twenty students on a single 
trait for instructors differing sensibly in the amount of the trait pos- 
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Taste VII.—CorRELATIONS OBSERVED AND PREDICTED 




































































No. r’s No. of Average |Correlation Diff PEait Diff. 
computed | instructors) observed | predicted! : f | PRaist- 
Trait 5. Presentation of subject-matter 
1 ve. 1 20 37 .419+ .091 
> 508.255 5 36 .736+ .052|.783+ .064| .047 .082 .6 
210 ve. £10 5 20 845+ .043|.878+ .054| .033 .069 5 
215 vs. 515 5 7 .887 + .053' .915+ .067| .028 .085 3 
220 vs. 520 1 13 -877 + .043, .935+ .038; .058 .057 1.0 
230 vs. 230 1 10 .876+ .050|.956+ .030' .080 .058 1.4 
Trait 10. Stimulating intellectual curiosity 
lvs. 1 10 37 .354+ .097 
> 5ve. 2 5 5 36 .479 + .086|.733+ .086|) .254 .122 2.1 
210 vs. 510 5 20 .717 + .073| .846+ .077' .129 . 106 1.2 
215 vs. 215 5 7 .852 + .070|.892+ .096; .040 .119 .3 
220 vs. £20 1 13 .636 + .111|.915+ .055| .279 .124 2.3 
230 vs. £30 1 10 .805 + .075| .943+ .045) .138 .087 1.6 
Trait 1. Interest in subject-matter 
‘o-. § 10 37 .290+ .102 
> 50.25 5 36 .728+ .053|.671+.110| .057 .122 5 
210 vs. 210 5 20 .717 + .073| .803+.106| .086 .129 a 
215 vs. 215 5 7 .754+ .110|.860+ .137| .106 .176 .6 
220 vs. £20 1 13 .805 + .066/.891+ .081| .087 .105 8 
230 vs. 230 1 10 .936 + .026| .925+ .066) .011 .071 .2 
Sums of all traits 
1 vs. 1 10 37 .320+ .096 
220 vs. 220 1 40 898 + .021'.904+ .038| .006 .043 1 
230 vs. £30 1 16 .872+ .040| .933+ .043) .061 .059 1.0 
— 2 
1 These r’s were calculated by means of Shen’s formula, PEr = See = See his 





V Nl + (@ — Ir}? 
article, A Note on the Standard Error of the Spearman-Brown Formula. Journal of Educational 


Psychology, Vol. XVII, 1926, pp. 93-94; also Douglas, Harl A., A Note on the Correctness of Certain 
Error Formulas. Journal of Educational Psychology, Vol. XX, 1929, pp. 434-437. 

sessed yield reliabilities which compare rather favorably with the 
reliabilities reported for standardized mental and educational 
tests. ) 

4. It is probable that in the majority of situations in which sub- 
jective judgments are used—personnel ratings, stock judging, debate 
judging, beauty contests, jury verdicts, political polls, etc.—the 
Spearman-Brown prophecy formula indicates the number of judgments 
required for a given reliability, although here it must be admitted that 
we are going beyond the known facts. 
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SEMI-LOGARITHMIC VERSUS LINEAR PLOTTING 
OF LEARNING CURVES 


RICHARD W. HUSBAND 


University of Wisconsin 


The usual methods of plotting learning curves, say of time or of 
errors, using linear functions in both variables, have certain limitations. 

1. The actual shape of the curve will vary considerably with differ- 
ent spatial separation of successive units. If the spread is wide, the 
whole curve will be rounded out, approaching the so-called ‘‘typical 
concavity.” If it is small, the curve will be flattened out near the 
base line. 

2. The unit gains tend to be artificial, and dependent on the arbi- 
trary intervals chosen. Reduction from ten to nine errors is not 
nearly as important or as difficult to accomplish as from two to one 
or from one to zero, yet on a linear diagram the two appear equal. 

3. A further difficulty lies in making objective and decisive 
comparisons between curves. Aside from theoretical and technical 
interest, the chief functions of quantitative experimentation, including 
the subsequent statistical treatment, are to compare the relative 
performances or efficiencies of different groups or techniques. Exam- 
ples are such questions as two different spacings of learning periods, 
dietary or drug conditions, or comparisons between several species. 

Just as Thorndike and Koffka interpret stupidity and insight 
respectively from the same curves, so may two investigators read 
clear superiority and negligible differences from a single set of com- 
parative curves. The same ambiguity is present in comparing means 
unless we use probability figures derived from the standard error 
of the difference between the means. With linear plotting we do 
not have available any single figure to use in expressing entirely apart 
from personal opinion the differences between performances in learn- 
ing, especially considering that we are dealing with a continuous 
temporal series. We are more or less limited to admiring the curves 
from an aesthetic standpoint, and passing a few judgments from 
inspection. 

Occasionally an ambitious investigator has attempted to fit the 
curves he has obtained to an equation, in order to express the progress 
of learning in a single figure or in a single function. This is all very 
well, but such equations are usually highly specialized, and are only 
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suitable for the single set of data. They often involve advanced 
mathematics, not an available tool to all experimenters; use logarithms, 
trigonometry, or calculus, and difficult-to-determine constants of only 
individual application. 

All these difficulties melt away at once if one plots the same data 
on semi-logarithmic paper. The horizontal axis or abscissa is used 
for successive trials, as usual, and is kept on a linear basis, as each 
trial is by nature equally separated from all others. The vertical axis, 
or ordinate, is scaled logarithmically, and the score under considera- 
tion, errors, time, or units done, as the case may be, is plotted on this. 
In some cases, owing to the manner in which the paper is printed, 
the original scores will have to be converted into percentages, say of 
initial or of final performance. This, however, is no serious handicap, 
as the original unit scores may be written on the axis as well. No 
knowledge of logarithms is needed; plotting is done as directly as 
when linear paper is used. The graph paper itself takes care of 
this function. 


oo © 2. © 














: es 
7 ? 0 is — 
Cuart I,—Linear learning curves. (Ordinate) average number of errors. (Abscissa) 
trial. 


This procedure disposes of the difficulties we have raised as follows: 
As to the first, the shape of the curve, there will be no difference, no 
matter how it is plotted. Regularly progressive learning will invari- 
ably result in a straight line. The second objection is disposed of, 
as the fundamental principle of logarithms is proportion rather than 
absolutes. On this basis equal proportional gains are represented 
equally, no matter at what point in the iearning process. Thus, 
reduction from eight to four errors does not show up as any greater than 
does later improvement from four to two, or from two to one. There- 
fore, if an individual learns a constant fraction of the material on each 
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trial the successive points will make a straight line when connected 
together. This leads us directly to solution of the third defect of the 
former method, that of objective comparisons between curves. If 
the two or more curves under consideration take practically straight 
lines, we can compare progress made by the various groups by quoting 





Cuart II.—Semi-logarithmic learning curves. (Ordinate) average number of errors. 
(Abscissa) trial. 


the number of trials required by each to reduce by half the error or 
time score, or to double the output. 

We include learning curves drawn from the same data by the 
two methods. The data were obtained by the writer from a group 
of eighty subjects who learned a maze of moderate difficulty. In 
both cases the solid line represents the figures for the full group (eighty), 
the broken line those (twenty-seven) in the group who learned by a 
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purely ideational method, and the line made up of a dash and two 
dots the motor learners (fifteen). The data were smoothed by a 
moving average of three trials, in order to reduce minor fluctuations. 
This in no way makes any difference in the comparisons, since both 
sets of curves were treated alike. The straight lines drawn in the 
logarithmic data are only approximate, it must be admitted, but 
the deviations of the individual points are seem to be remarkably 
small. 

Let us now give sample interpretations of the two sets of curves. 

(A) Linear Method.—(1) All groups start at about the same 
level. (2) The motor learners soon are at a disadvantage, while those 
using an ideational attack do better than average. (3) The differences 
appear to widen up to about the tenth trial, after which there is 
doubt. 

(B) Logarithmic Method—(1) and (2) the same as under (A). 
(3) The absolute differences, as well as relative, are constantly wid- 
ening, and as far as the twentieth trial there is no tendency to narrow 
down. (4) Learning, in this problem at least, is characterized by a 
proportionate reduction of errors, which shows up in this chart 
asa straight line. (5) Judging by reduction of errors from six to three 
(or any other arbitrary limits) the average individual halves his 
inaccuracies in 61% trials, the ideational learner in 444, and the motor 
learner in fourteen attempts. 

Do not these last figures alone mean much more than any amount 
of verbal discussion, description, and estimate? In other words, 
we lose nothing but gain a great deal. Yet no complicated mathe- 
matics is needed in order to make one’s interpretations and compari- 
sons. All in all, plotting data on a logarithmic basis gives us clearly 
objective and concise ways of interpreting data. 

The writer does not claim that all learning curves will form a 
straight line if plotted this way. But in trying sixteen different sets 
of figures from maze learning records, there was not one in which the 
points did not hover within the normal variability limits of a frequency 
on either side of the lines which were drawn. Slight curvilinearity 


would not stand in the way of interpretations on the basis we have 
described. 
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Fundamentals of Educational Psychology, by I. M. Gast and H. C. 
Skinner. New York: Benj. H. Sanborn and Co., 1929. Pp. 
XIII + 354. 


Changing viewpoints in educational psychology necessitate fresh 
treatments of the subject. The aim of the authors of the present 
volume has been to relate the subject more closely to classroom 
practice, to include the social and applied aspects, and to present the 
best of new opinion and evidence of new movements in education and 
psychology. Their account is expressed in terminology that the 
average student can understand, uncomplicated by highly technical 
details. Educational applications are made for every topic discussed. 
The most recent reliably determined evidence in support of statements 
is consistently referred to. The authors state that they have not 
attempted to present a “‘logical system of psychology.” They have 
gleaned from a wide variety of sources and from varied schools of 
psychology explanations of phenomena which appear to them to be 
most authentic. The result is a presentation of the subject that 
contains much of the older material, the more familiar topics one is 
accustomed to seeing in a standard text, together with new points 
of view and an occasional new chapter, ‘‘ Mental Health.” ‘‘ Childhood 
and Adolescence,” ‘‘The Gifted Child,” “Nature and Nurture,” 
“‘Tntelligence—Its Nature and Measurement,” ‘‘ Educational Measure- 
ments.”’ A list of the psychologists who have contributed most to 
education is given, and a unique feature of the book is the inclusion 
of their photographs. Questions and problems for further study, a 
list of references cited in the text, and suggested readings follow each 
chapter. 

The authors have avoided critical discussion of divergent opinions 
and antagonistic theories of different schools of psychology. The 
newer interpretations of gestalt psychology are scarcely mentioned. 


Description of the physiological bases of behavior has been limited 
76 





me © © © he 


~ ~~ ee TE ee FD he heed 





New Publications 77 


to a few of the most essential facts which are stated in the introductory 
chapter. The authors appear to be committed to a “conditioned 
reflex’”’ interpretation of learning, although the discussion is not at 
all times consistent. Their explanations of learning, habit formation, 
and skill impresses the reader as being the least commendable of all 
the topics discussed. The educational applications contain many 
excellent ideas which might well have been expanded or more fully 
explained. Too often they are in the form of advice which is not 
closely related to the psychological principles presented and not 
pertinent to the subject discussed. The result is occasional lack of 
coherence. These defects are minor and do not detract from the 
usefulness of the book as a source of material for the student of educa- 


tional psychology. GERTRUDE HILDRETH. 
The Lincoln School of Teachers College. 





Studies in The Organization of Character, by Hugh Hartshorne, Mark 
A. May, and Frank K. Shuttleworth. New York: The Macmillan 
Co. Pp. 503. 


This volume brings to a close the work of the ‘‘ Character Education 
Inquiry.” It follows ‘‘Studies in Deceit”? and ‘Studies in Service 
and Self-control,’”’ and is in many respects the most important volume 
of the three. It contains the following parts: I. Social Intelligence 
and Social Attitude; II. Interrelations of the Factors of Character; 
III. Components of Character; IV. The Significance of Integration; 
V. Conclusions of the Character Education Inquiry. 

Those who have followed the work of the ‘‘Character Education 
Inquiry” will recall that the investigation showed that deception, 
helpfulness, cooperation, persistence, and inhibition were groups of 
specific habits, rather than general traits: ‘‘One could hardly predict 
from what a person did in one situation what he would do in a different 
situation.”” There are, however, obvious differences in individuals 
in the degree to which their behavior is of a consistent pattern. It is 
chiefly with the study of this inner consistency that this volume is 
concerned. 

To eight hundred fifty children in the fifth to eighth grades of three 
school groups (or composites) were given a large battery of tests, 
including tests of moral knowledge and attitudes, tests of deception, 
of cooperation, of inhibition, of persistence, of suggestibility, of 
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“nervous instability,’ and various rating schemes, for determining 
“level of social functioning,’’ reputation, and the like. Measures of 
intelligence and home background were also available. 

The correlations between the various objective tests, even when 
corrected for attenvation, showed little relationship between gross trait 
scores. For example, honesty correlates with service .439, with 
inhibition .487, with persistence .166, ‘‘This meager consistency in 
behavior of good or socially desirable types may seem lamentable, 
but we shall have to conclude that consistency is not characteristic 
of children . . . We cannot infer from the conduct tests the presence 
of a general factor.”’ 

The effect of group standards on conduct was investigated, and 
while there was little relationship between an individual’s ethical 
understanding and his conduct (aside from the relationship existing 
through intelligence), there were definite relationships (aside from 
intelligence) between knowledges of right and wrong and actual 
conduct when classroom groups were considered as a whole. 

Probably the most immediately valuable part of this report is that 
dealing in a very detailed manner with the integration of honesty. 
Other of the character traits might have been investigated, but the 
honesty tests were used since they were the most comprehensive 
battery available. It will be remembered that the intercorrelaticu 
between the various honesty tests was very low, leading necessarily 
to the doctrine of specificity of reaction in this field. Now if the 
scores of a child in each test be distributed, and a measure of dispersion 
computed, we have a measure of the degree to which the child tends 
to be consistent in his behavior—be it honest or dishonest. Twenty- 
one tests were used. For each child the SD of his scores was computed. 
This is called his index of integration. Thus a perfectly honest 
(or dishonest, or 50 per cent honest) child would have an index of 
0.00. When the unreliability of the tests used was compensated for, 
there still remain large differences in integration. These integration 
indices were correlated with total honesty score. The correlation, 
corrected for attenuation, is about .80. This means that the more 
honest a child is the more consistent is he in his honesty. The possi- 
bility that this result is an artefact of the nature of the testing set-up 
is interestingly and fairly discussed, pro and con. It is concluded that 
in so far as the test situation is responsible for the correlation, it is 
not artificial in that the same factors which make the more honest 
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children more consistent in their test results also are operative in the 
life situations calling for honesty. 

The implications of the above are interesting. Although moral 
conduct is, in the main, largely specific, it is shown that as conduct is 
improved there results a general improvement. It is the behavior 
of the dishonest child which can be most readily modified. 

It is difficult adequately to appraise a study so vast in its scope 
as this work. It is certain, however, that this Inquiry is of really 
tremendous importance to education. And the present volume is 
easily the most interesting and valuable to the non-research person 
in the educational field. DoNALD SNEDDEN. 

New York University. 





Research Methods and Teachers’ Problems, by Douglas Waples and 
Ralph W. Tyler. A Manual for Systematic Studies of Classroom 


Procedure. New York: The Macmillan Co., 1930. Pp. XXIII 
+ 653. 


The solution of administrative and instructional problems by 
research methods is an important aspect of modern educational 
practice. The research worker now has at his disposal a group of 
publications describing research technique to which the present 
book, designed as a text for teachers, administrators and technicians 
is an important addition. Research is interpreted by the authors’ 
in a broad sense as including not only laboratory experimentation but 
the study of service problems such as the teacher or administrator 
meets in operating a school. The authors, however, distinguish 
between these two types of research, pointing out the narrower scope 
and more intensive approach in laboratory studies. Such a problem 
as testing and classifying a group of pupils is considered research 
though distinguished from laboratory experiments in measurement 
and classification. 

The authors have devised a working plan for research projects in 
general, divided into several major steps. The types of data needed 
for typical problems are listed, available sources of data are 
enumerated, and methods of obtaining data from available sources are 
outlined. The research problems commonly met by educators are 
classified in three groups, problems of the curriculum, of teaching 
methods, and of school management. For each of these three classifi- 
cations appropriate methods of attack are outlined and illustrative 
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studies are presented in outline form. A list of sixteen different 
techniques or methods of approach with an evaluation and description 
of the method of applying each concludes the book. Adequate 
bibliographies are provided at the close of major topics. 

No book on research methods can of itself produce experts in 
educational research, just as the expert cook is not necessarily the 
product of a cook book; but just as the cook’s technique improves with 
training in method, so may the research student improve his experi- 
ments with increased training in technique. To follow any treatise 
on research methods slavishly would be to defeat the very purpose of 
research. In using the book under discussion some students may be 
inclined to accept too readily the interpretations of the authors and the 
carefully detailed outlines without exercising originality or independ- 
ence of judgment. When the outlined pattern fails to fit, the student 
must be resourceful and intelligent enough to alter or adapt it. The 
authors unfortunately give no discussion of the use of research in 
modernizing educational methods. GERTRUDE HILDRETH. 

The Lincoln School of Teachers College. 











